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An n-state deterministic finite automaton over a A;-lctter alphabet can be 
seen as a digraph with n vertices which all have k labeled out-arcs. Grusho 
[20] proved that whp in a random k -out digraph there is a strongly connected 
component of linear size, i.e., a giant, and derived a central limit theorem. 
We show that whp the part outside the giant contains at most a few short 
cycles and mostly consists of tree-like structures, and present a new proof of 
Grusho’s theorem. Among other things, we pinpoint the phase transition for 
strong connectivity. 
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1 Introduction 

1.1 The model and the history 

The deterministic finite automaton (dfa) is widely used in computational complexity 
theory. Formally, a dfa is a 5-tuplc (Q, E, 5, g 0 , F), where Q is a finite set called the 
set of states, S is a finite set called the alphabet, 6 : Q x E —>■ Q is the transition 
function, q 0 G Q is the start state, and F C Q is the set of accept states. If q 0 and F 
are ignored, a dfa with n states and a ^-alphabet can be seen as a digraph with vertices 
[n] = {1..., n} in which each vertex has k out-arcs labeled by 1,,.., k (a k-out digraph). 
Note that such a digraph can have self-loops and multi-arcs. For a basic introduction to 
dfa and its applications, see [37]. 
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Let T> n: k denote a digraph chosen uniformly at random from all k- out digraphs of n 
vertices. Equivalently V, hk is a random k- out digraph of n vertices with the endpoints 
of its kn arcs chosen independently and uniformly at random. 

When k — 1, D nk is equivalent to a uniform random mapping from [n] to itself, which 
has been well studied by Kolchin [27], Flajolet and Odlyzko [18], and Alclous and Pitman 
[2], In T> n> i, the largest strongly connected component (see) has expected size Q(y/n), 
and so does the size of the longest cycle. However, as shown later, for k > 2, the largest 
SCC has expected size O(n). 

From now on we assume that k > 2. Let S v (the spectrum of v) be the set of vertices 
in D nk that are reachable from vertex v, including v itself. In 1973 Grusho [20] first 
proved that (|«5i| — u k n) / a k \fn converges in distribution to a standard normal, where v k 
and <j k are explicitly defined constants. 

Given a set of vertices S C [n] , call S closed if there are no arcs that start from vertices 
in S and end at vertices in S c = [n] \ S. Let Q n be the set of vertices in the largest 
closed SCC in T> n k . (If the largest closed SCC is not unique, let Q n be the vertex set of the 
largest closed SCC that contains the smallest vertex-label.) We call Q n the giant. Grusho 
also proved that \Q n \ has the same limit distribution as |«Si| by showing that with high 
probability (whp) Q n is reachable from all vertices and that <Sl| — |= o p (y/n) (see [22] 
for the notation). His proof relics on a result by Sevast’yanov [35] which approximates 
the exploration of with a Gaussian process. 

In 2012 Carayol and Nicaud [10] proved a local limit theorem for [5x1 by analyzing the 
limit behavior of the probability that |<Si| = s for an s close to u k n. Their proof depends 
on a theorem by Korshunov [28] which says that conditioned on every vertex having 
in-degree at least one, the probability that <Si = [n] tends to some constant. Carayol 
and Nicaud derived a simple and explicit formula of this constant from their theorem. 
(The same formula is also proved by Lebensztayn [29] with a more analytic approach 
using Lagrange series.) 

Lately the simple random walk (SRW) on T> n:k has gained some attention for its 
applications in machine learning. Addario-Berry, Balle, and Perarnau [1] studied the 
stationary distribution of the SRW by analyzing the distances in D nk . They proved 
that the diameter and the typical distance, rescaled by logn, converge in probability 
to explicit constants. Angluin and Chen [3] studied the rate of the convergence to the 
stationary distribution of the SRW. They also suggested an algorithm for learning a 
uniformly random DFA under Kearns’ statistical query model [26]. 

1.2 Our results and a sketch of proof 

A digraph can be uniquely decomposed into SCCs which form a directed acyclic graph 
(dag) through a process called condensation that contracts every SCC into a single vertex 
while keeping all the arcs between SCCs [5]. The condensation DAG of D n k is denoted 

by V n,k- 

Let = [n] \ Q n , i.e., Q c n is the set of vertices that are outside the giant. The 
structure of V^ k depends on T) ntk [Q^], the digraph induced by Q' n . Our analysis shows 
that in T> n k [Q^] the total number of cycles and the number of cycles of a fixed length 
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both converge to Poisson distributions with constant means. So the number of cycles 
and the length of the longest cycle are both O p ( 1) (see [22]). Furthermore, these cycles 
are vertex-disjoint whp. Therefore, almost every vertex in Q c n is a SCC itself and k is 
very much like T> nk with the giant contracted into a single vertex. 

The d-core of an undirected graph is the maximum induced subgraph in which all 
vertices have degree at least d. Similarly the d-in-core of a digraph can be defined as 
the maximum induced sub-digraph in which all vertices have in-degree at least d. Let 
O n denote the set of vertices in the one-in-core of V n ^. Note that Q n C O n since a SCC 
induces a sub-digraph with each vertex having in-degree at least one. Also note that 
cycles cannot exist outside O n , for otherwise they contradict the maximality of O n . Now 
assume that every vertex can reach Q n , which happens whp by Grusho [20]. Then T> n) k 
can be divided into three layers: the center is Q n : then comes O n \ Q„ , which consists 
of cycles outside Q n and paths from these cycles to Q n \ the outermost is O c n = [n] \ O n , 
which is acyclic. 



Figure 1: Three layers of T> n y. the giant G n ] the one-in-core O n ; and the whole graph. 

Since there cannot be many vertices in cycles outside the giant, the middle layer 
O n \ Gn must be very “thin”. Thus if we can prove (| O n — Vk‘n)/\/n converges to a 
normal distribution, then we can also prove it for Q n \. The event \O n \ = s happens if 
and only if there is a set of vertices S with |<S| = s such that: (a) V nik [S\, the sub-digraph 
induced by S, has minimum in-degree one ( surjective ) and there are no arcs going from 
S to S c (closed), which we refer to as S being a k-surjection (since V n ^[S\ is equivalent 
to a surjective function from [ks] to [s]); (b) V n ^[S c \ is acyclic. The probability of (a) 
can be computed by counting the number of surjective functions. And we are able to 
show that the probability of (b) converges to a constant. Note that for a fixed set S 
(a) and (b) are independent because they depend on the endpoints of two disjoint sets 
of arcs. Thus we can get the limit of P {O n = 5}. Since the one-in-core of a digraph 
is unique, P{|(9 n | = s} = Xlsc[n]-|<S|=.s ^ {^n = £}• Thus we can finish the proof by 
computing the characteristic function of (| O n | — u k n)/^/n. 

Note that although our formula for P {\O n \ = s} is inspired by and resembles Carayol 
and Nicaud’s formula for P {|<Si| = s}, we actually prove the result from scratch without 
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relying on previous work. Since we are able to derive explicit expressions of all the 
constants in our formula, the computation of the characteristic function becomes quite 
simple. Furthermore, to our knowledge this is the first self-contained proof. Thus in 
Section 2 we prove: 

Theorem 1 (Central limit law). Let Z denote a standard normal random variable. 
Then as n —> oo 7 

CC n k n d ^ \Gn\ v/./r <i ^ max^gjjj] <S V n k n y ^ 

/— ^ ^ ^ 25 , 

0-fcVn o‘/cV n 

where u k and a k are constants defined by 

^ T~k 2 

k k 1 k ke Tk (l — ke~ Tk ) ’ 

and T k is the unique positive solution of 1 — T k /k — e -Tfc = 0. 

Remark. Equivalently, v k is the unique positive solution of 1 — v k — e~ kUk and 

2 _ l'k (1 - v k ) 

ak ~ i- k (i-u k y 

Let G(n,m ) be a Erdos-Renyi random graph, i.e., a graph chosen uniformly at random 
from all graphs with n vertices and m edges [16]. It is well-known that for A; > 1, 
l^maxl—the size of the largest component in G(n,m = nk/2 )—is ( u k + o(l))n whp. 
Moreover, (|C” iax | — v k n)/^/n also converges in distribution to a normal random variable 
with variance a k (see, e.g., Durrett [14]). Intuitively, this is because the in-degree of a 
vertex in T> n k has asymptotically a Poisson distribution of mean k. Thus a backward 
exploration process from vertex in T> n ^ k is approximately a Galton-Watson process with 
survival probability u k , as is the exploration process starting from a vertex in G(n, m = 
nk/2). 

Section 3 studies the part of V n ^ k outside the giant, which determines the structure of 
T)‘/ k and supports the proof of Theorem 1. Our results are summarized in two theorems, 
where all our logarithms are natural: 

Theorem 2 (Cycles outside the giant). We have: 

(a) Let L n be the length of the longest cycle in T) n k [G//\. Then L n = O p ( 1). 

(b) Let C n be the number of cycles in T) n)k [G//\. Then 

c " 4 pol ( iog TV^)’ 

where Poi(x) denotes the Poisson distribution with mean x. 
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(c) Let C n ,t be the number of cycles of length l in V n ,k[Gn\- Then for all fixed £ > 1, 


C n / —> Poi 



Theorem 3 (Spectra outside the giant). Let S' v = S v n Qf, Le., S' v is the spectrum of 
v in T> n .k[G'n]■ Let dist(ryu) be the distance from v to u, i.e., the length of the shortest 
directed path from v to u. Then 

(a) P {u„ e gc[arc (fD n ^[S' v \) — |«S'| > 1]} = o(l), where arc(-) denotes the number of arcs. 
In other words, whp every spectrum in T> n , k [Q^ is a tree or a tree plus an extra arc. 

(b) Let S n = max ve gc |S'|. Let X k = (k - r k ) (^ T ) fc \ Then 

S n 1 

log n log(l/A fc )' 


(c) Let W n = max ve gc min ue g n dist(u,u), i.e., the maximum distance to Q n . Then 


W, 


logfc log n 


n A 1 . 


(d) Let M n be the length of the longest path in Then 


M n 1 

log n log (e Tk /k) 


(e) Let D n = rnax,, e gc max ue 5 < dist(u,u). Then 

D n 1 

log n log (e Tk /k) 

The rest of the paper gives some other results regarding this model. Section 4 shows 
that T nk exhibits a phase transition for strong connectivity. Section 5 extends some of 
our results to simple fc-out digraphs. Section 6 analyzes the typical distances in V n k 
with a technique called path counting, which is very different from the method used by 
Addario-Berry et al. in [1], Section 7 suggests some extensions of this model. 

Remark. Lemma 9 shows that | O n \ — Q n = O p (1). The intuition is that a digraph 
with minimal in-degree and out-degree at least one is likely to have a large SCC. This 
phenomenon is also observed in D{n,p ), which is a random digraph of n vertices with 
each possible arc existing independently with probability p. Pittel and Poole [33, thm. 
1.3] showed that in D(n,p ) the (1, l)-core—the maximal induced sub-digraph in which 
each vertex has in-degree and out-degree at least one—differs from the largest SCC in 
size by at most 0((logn) 8 ), whp. This intuition is also used for studying the asymptotic 
counts of strongly connected digraphs (see Perez-Gimenez and Wormald [34], Pittel [32]). 
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2 The size of the one-in-core 


2.1 The law of large numbers for the one-in-core 

To prove Theorem 1, we first need to narrow the range of \O n \ to close to u k n. 
Theorem 4 (Law of large numbers). For all fixed 6 G (0,1/2), 

V{\O n \iZ n }< 1 + ° (1) , 

n 

where X n = [v k n — n 1 / 2+<5 , u k n + n 1 /' 2+5 }. 

Thus \O n \/n A z/fc, which gives the theorem its name. 

Let K s be the number of /e-surjections of size s in T> n k . Then it suffices to show that 
K s > 1} < (1 + o(l))/n. As argued in the introduction, for a set of vertices 
S to be the one-in-core, it must also be a /e-surjection, i.e., every vertex in V n ^[S\, the 
sub-digraph induced by S, must have minimum in-degree one (5 is surjective) , and there 
are no arcs going from S to S c (S is closed ). Thus 

P (5 is a /e-surjection} = P (5 is surjective | S is closed} P (5 is closed} . 


Computing the limit of the two factors shows that: 


Lemma 1. We have 


l s£T n J 


1 + 0 ( 1 ) 


n 


And for s G X n 


where 




g(x) = 


y/ x (l- x )’ 

and ^'=(i) (e^- 1 )- 


/ s \ 

= g -) 


/(-)"' 

n \nJ 


L Vn/J 



k- i 

fix) = 

L 

X Ik 

L(i-x)d- 


Theorem 4 follows immediately. The proof of Lemma 1 is postponed to the appendix. 
(The two functions f(x) and g(x) are also studied by Carayol and Nicaud [10].) 


2.2 The central limit law of the one-in-core 

In this section we prove the part of Theorem 1 about \O n \. The rest of the theorem 
appears as corollaries in Section 3. Let dO n = \O n \ — u^n. Then dO n takes values in 
[n] — u k n = (s : u k n + s G [n] }. As Theorem 4 shows, whp dO n < n 1 / 2 " 1 " 5 for all fixed 
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S G (0,1/2). Thus it suffices to consider only the probability that dO n takes value in 
the set 

Jn = (W - Vkn) n [-n 1 / 2 +s ,n 1/2+5 ] , 

for some fixed 5 G (0,1/2). Thus the characteristic function of dO n /y/n is 
Mt) = J2 eitS/V * V ( d ° n = S }+Y1 e its/ ^p {DO n = s} 

se([n]—u k n)\J„ s£j n 

= o{ 1) + e its/Vri P {dO n = s} . 


Let S be a set of vertices with |«5| = u k n + s for some s G J n . Recall that O n = S if 
and only if S is a fc-surjection and "D n k [d> c ] is acyclic, two events that are independent. 
By Theorem 5 in Section 3.2, P {'D n j,X[S r ] is acyclic} ~ 1 — ke~ Tk . Also recall that K x 
counts the number of /e-surjections of size x. It follows from Lemma 1 that 


P {dO n = S }= P (On = 5} 

«SC[n] : |«S|=i^fcn+s 

= P (5 is a /e-surjection} x P {T> nik [S c \ is acyclic} 

S C [n]: | S | =.n+s 

~ (1 - ke~ Tk )EK Ukn+s 


1 — ke~ Tk 1 / s 

-x- 7= 9 u k + - 

2n Jn \ n 


f Ac + ~ 

n 


where K x , f(x) and g(x) are defined as in the previous subsection. 
If .s' G J ni then Lemma A6 in the appendix shows that 


n 


9 [k'k H— = i + o - 


n 


<7kV 1 — ke Tfe ’ 


and 


/ A + - = exp 

n 


2 ofn 2 


+ 0 ^r 


Therefore, choosing 5 small enough, e.g., 6 = 1/9, we have 

1 1 


e^/^P {dO n = s} 




V 2 ™k ^ 
= 0 ( 1 ) + 


E ei 


its/y/n 


n° 


exp 


A 


e ltx exp 


ncrr 


> —n° 


= o(l) + 


A 


e ltx exp 


7r at 


= o(l) + exp 


°lt 2 


X 


2 cr k n / 
.2 


X 


2 o* k 


x 


2 al 


dx 

dx 
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Thus the characteristic function of dO n /\fn converges to exp(<xft 2 /2), the characteristic 
function of a k Z. It follows from the central limit theorem that dO n j\fn converges to 
o k Z in distribution. Note that using the estimates of this section, we actually have a 
local limit theorem for \O n \. 

3 The structure of the directed acyclic graph 

3.1 De-randomizing the giant 

Since a SCC induces a sub-digraph in which each vertex has in-degree at least one, a 
closed SCC is also a /e-surjection. Lemma 1 implies that whp all /e-surjections are of sizes 
in Z n = [ukn — n 1 / 2+<5 , z 4 .n + n 1/,2+5 ]. When this happens, as > 1/2 (Lemma Al), there 
exists one and only one closed SCC and it is Q n . And if Q n is the only closed SCC, then 
every vertex must be able to reach it. This can be summarized as: 

Lemma 2. Whp \Q n \ G Z n and Q n is reachable from all vertices. 

Since e~ Tk = 1 — Tk/k = 1 — v k , the above lemma implies that whp \\Q^\ ~ e~ Tk n\ < 
n l / 2+6 . Thus the structure of Z> n> k[G c n }, the sub-digraph induced by Q c n = [n] \ Q n , should 
be close to that of a sub-digraph induced by a fixed set of vertices whose size is close to 
e~ Tk n. Formally, we have: 

Lemma 3. Let f n be a sequence of integer-valued functions on a sequence of digraphs. 
Let X be an integer-valued random variable. If there exists a sequence e n —» 0 such that 

sup \\fn(V ntk [V c n ]),X\\ TV <e n , 

V„C[n]:|V n |eX„ 

where = [n] \ V n and || •, -|| TV denotes the total variation distance, then 

/„(E\aKD 4x 

Proof. Define the event E n = [\Q n \ £ X n ]. Let m be an integer, let V n C [n] be a fixed set 
of vertices with |V n | G Z n . Recall that since > 1/2, |V„| > n/2 for large n. Thus the 
event [Q n = V n ] depends only on the induced sub-digraph [V n ] , which is independent 
of Therefore the two events [Q n = V n ] and [f n {Pn,k\y%\) — m ] are independent. 

Using this observation and Lemma 2, we have 

P {fn(V n , k [G C n}) = m} 

= P {[/„(P„,*[£«]) = m] n E c n } + P {[fniPn^) = m] n E n } 

= o(l) + P {fn(V n ,k[V c n }) = m I g n = V n } P {g n = V n } 

V„C[n]:|V„|eX n 

< o(l) + (P {X = m} + e n )P { g n = v n } 

VnC[n]:|V„|eX„ 

< o(l) + P {A" = m} . 
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Similarly we have P {fn{'D n ,k[Gn]) = cn} > P {X = m} + o(l). Since this applies to all 
integers m, f n ip n , k [Qn J) ^ x • □ 


Corollary 1. Let S n be a sequence of sets of digraphs. If there exists a sequence e n —> 0 
such that 

sup P {:D n , fc [V“] £ £ n } < s n , 

V„C[n]:|V„|eX n 

then whp T> ntk [G°\ E £ n . 

Proof. This follows from the previous lemma by taking X = 1 and f n to be the indicator 
function that a digraph is in £ n . □ 

The rest of this section proves Theorem 2 and Theorem 3. But instead of working 
on Qf directly, we prove similar theorems on fixed sets of vertices, and then apply the 
above lemma or its corollary to get the final result. 

3.2 Cycles outside the giant 

In this subsection, we show the following: 

Theorem 5. Let c o n -E oo be an arbitrary sequence. There exists a sequence £ n = o(l) 
such that for all fixed sets of vertices V n C [n] with \V n \ E X n , we have: 

(a) Let L* n be the length of the longest cycle in T) nk [V^]• Then P {L* n > u> n } < e n . 

(b) The probability that T> n ,k [V^] contains vertex-intersecting cycles is at most e n . 

(c) Let Cf ( be the number of cycles of length I in T> n .k [V^]. Let Xu = Poi {{ke~ Tk Y/I). 
Then for all fixed I, || C*p ^|| TV < s n . 

(d) Let C* be the number of cycles in V n ,k\y%\- Let X = Poi(log 1 _ fc ^- Tfc )■ Then 
||C*,X|| TV < e n . Ts a result, |P{C* = 0} — (1 — ke~ Tk )\ < 2e n . 

Theorem 2 follows from the above theorem and Lemma 3. Our proof is inspired by 
Cooper and Frieze’s work on the directed configuration model [12], Note that the 
Cooper-Frieze model is different from that studied by us. In their model, both in¬ 
degrees and out-degrees are predetermined, whereas we require all out-degrees to be k 
but the in-degrees are random. 

The intuition behind Theorem 5 is that when two cycles share vertices, they contain 
fewer vertices than arcs. So if we fix the “shape” of a pair of such cycles, the number of 
ways to label them times the probability that they both exist is o(l). Thus whp cycles 
in are vertex-disjoint and the total number of cycles has a distribution close to a sum 
of independent indicator random variables. 

In the following proof, instead of finding the exact £ n , we derive implicit o(l) upper 
bounds for probabilities and total variation distances which only requires that V n \ E T n . 

Lemma 4. Let C* = 'Ei<e<u n C l,t Then P { C n ± Q} = 
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Proof. Define (x)e = x(x — 1) • • • (x — i + 1). Then the number of all possible cycles of 
length t is (|V£| )tkr/£. (Note that we are also considering the labels on arcs, which makes 
the counting easier.) And the probability that such a cycle exists is n~ £ . Recalling that 
|V, C J G [ e~ Tk n — n^ 2+s , e~ Tk n + n 1 / 2+<5 ], we have 

E KA = (i) < (te-’-*(l + 0(n-‘/ 2 + 5 )))'. (1) 

Since ke~ Tk = k — < 1 (Lemma Al), there exists a constant C\ < 1 such that the 

above is less than c[ for n large enough. Since C* C* if and only if Y2e >u > n Cle — 


P {Cl ± C*} = P 



c: e > i 


< E 


,i>w n 


a 


nl 


<0(cf n ) = o{ 1). 


□ 


Since L* n > u> n if and only if C* C*, part (a) of Theorem 5 follows. From now on 
let l o n = log log n. We show that: 

Lemma 5. Let X and Xf be as in Theorem 5. Then ||Poi(EC*), X|| . = o(l). And for 
all £ < u n , ||Poi(EC';j,W|| Tv = o(l). 

Proof. For all l < co n , by (1) we have 

EC", = i (e- T *n + O (n 1/2+i )), k‘ (4) = <4_A (1 + o(e n -^)). 

Thus 

EQ= E E[c;,] = iog( r -d—-)+o( w „»- 1 / 2+i ). 

i<e<ui n k / 


Therefore EC* —y EX and EC* t —> EX^, which implies the lemma. □ 

Proof of Theorem 5. By the two previous lemmas, it suffices to show that 

||Q,Poi(EQ)|| Tv = o(l), ||C;„Poi(EQ,)|| Tv = o(l) for all fixed L 

We prove this by using a theorem of Arratia et al. [4], (A similar result is proved by 
Barbour et al. [6]). The method is known as the Chen-Stein method because it was first 
developed by Chen [11] who applied Stein’s theory [38] on probability metrics to Poisson 
distributions. 

Let C be the space of all possible cycles of length at most c o n in V n k [V r (]. For a G C, let 
£> Q C C be the set of cycles that are vertex-intersecting with a. Let t a be the indicator 
that a cycle a appears in D n:k [Vf]. Define 

bi = ^ ^2 ClaXAp, b 2 = ^2 X] E [iLorlL/?] , b 3 = ^ s <*, 

aEC fiEBo, a£C a£C 
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where 


s Q = E |E [l a |cr (tp: 0 eC\ B a )] - El a | , 

and cr(-) denotes the sigma algebra generated by (•). Theorem 1 of Arratia et al. [4] 
states that 

||Q, Poi(EQ)|| TV <2(6 1 + (, 2 + 6 3 ). 

If 0 G C \ I3 a , then a and 0 are vertex-disjoint. Thus t a and Ip are independent and 
s a = 0 for all qgC, i.e., 63 = 0. It suffices to show that b± and 62 are o(l). 

Let |a| denote the length of a cycle a. Fix £\ < t o n and £2 < co n . There are at most 
\V c n \^ cycles of length £\. For |a| = £\, there are at most ^i|V ^ 2-1 // 2 cycles of length 
£ 2 that share at least one vertex with a. Since (|V^|) £ = (1 + o(l))(e -Tfe n) £ for £ < c u n , 


Y Y E1 ° E1 /3 < (! + o(l)) \{e- Tk n) h k l1 ] [^{e-^nY 2 - 1 ^ 2 } f-") 

«CR -\m — Hr. \ n / 


ii+h 


aSC:|a|=+ /3eS Q :|/3|=^2 


= (1 + 0(1))— [(e-fc)' 2 ] ■ 

e Tk n 


Therefore 


= E E E E E1 * E ^ 

l<h<u} n 1 <h<u n a&C:\a\=ei / 3 eB a :\/ 3 \=e 2 

< (1 + E E H^ Tt Y'} [(fc- T ‘)' 2 ] 

+>i+>i 


<(l + o(l)) 


e~ Tk n 


'Y, £i{ke 

/,i>\ 


~Tk\i 1 


Y( ke 

+>> 1 


~Tk\i 2 


which is O (1 /n) since both sums converge. 

To compute b- 2 , we upper bound the number of pairs of vertex-intersecting cycles 
that could possibly appear in X> n ,fc [V r j] at the same time. Let a and 0 be such a pair. Let 
V(a), A(a), V(0), A(0) be the vertex set and (labeled) arc set of a and 0 respectively. 
Let a U 0 be the digraph of vertex set V — V (a) U V (0) and arc set A = A (a) U B(0). 
Assume that \V\ — s and |A| = s + 1. Note that as a and 0 share at least one vertex, 
t > 1. Since V C [n], we can relabel the s vertices in aU0 with [s] such that the order of 
the vertex labels is maintained. The result is a digraph with vertex set [s] and s +1 arcs 
labeled with [k\. There are at most ( s 2 ) s+t k s+t such digraphs, since there are at most s 2 
choices of endpoints and k choices of labels for each of the s + t arcs. Each digraph of 
this type corresponds to at most (' V s n ') < | | s pairs of cycles like a and 0. Thus there 
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where the last step we use that ui n = log log n. 

Thus part (d) of Theorem 5 for C* is proved. We can prove part (c) for C* e using 
the same method by limiting C to contain only cycles of a fixed length i. Note that the 
above inequality shows that the probability that there exist vertex-intersecting cycles in 
V n .k [V-] is o(l), thus part (b) is also proved. □ 

The method used above can be easily adapted to prove similar results for undirected 
cycles, like the following lemma which is needed in the study of spectra in T> n k [Q ^\: 

Lemma 6. Let fi> n —* oo be an arbitrary sequence. There exists a sequence e n = o(l) 
such that for all fixed sets of vertices V n with V n G T n , we have: 

(a) The probability that T> n _ k [V£] contains an undirected cycle of length greater than 
is at most e n . 

(b) The probability that T> n ^[V^] contains vertex-intersecting undirected cycles is at most 


Proof. Let Ug be the number of undirected cycles of length £ in T) n k [1A]. Then 

E[f h] < < ( 2 ke-T“(l+n- 1 ' 2+s ))‘, 

where the 2 comes from the fact that each edge in an undirected cycle has two possible 
directions. Since 2 ke~ Tk =2 {k — r k ) < 1 (Lemma Al), with exact the same argument of 

Lemma 4, we can show that E U k = o(l) for all —> oo. Thus (a) is proved. 

Now choose fi> n = log log n. Again we can show that whp there are no vertex- 
intersecting undirected cycles of length at most by repeating the computation of 
b -2 in the proof of Theorem 5 with ke~ Tk replaced by 2ke~ Tk in (2). □ 
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3.3 Spectra outside the giant 

In this section, we prove Theorem 3 (spectra outside the giant). Instead of working 
on Q r n directly, we again prove similar results on a fixed set of vertices and then apply 
Lemma 3 to finish the proof. 


3.3.1 The tree like structure of some spectra 

We prove part (a) of Theorem 3. Let V n C [n] with |V n | G X n = n 1//2+5 , v k n+n l ^ 2+5 \ 

be a fixed set of vertices. For v G = [n] \ V„, let S* be the spectrum of v in T> n j ; [V)j], 
the sub-digraph induced by V£. The following lemma shows that whp every spectrum 
in V n , k [V c n ] induces a sub-digraph that is a tree or a tree plus one extra arc: 

Lemma 7. We have 


sup P {u„ e v£[arc[<S*]) - |<S*| > 1]} = o(l), 

V„C[n]:|V„|eX„ 


where arc(-) denotes the number of arcs. 

Proof. For v G Vf, if a,vc(V n ^[S*]) > |«S*| + 1, then V Utk [S*\ must contain at least two 
undirected cycles. By Lemma 6, whp all undirected cycles in V n , k [S*] are vertex-disjoint. 
Therefore, if V n j ; [S*] contains two undirected cycles, then whp they are vertex-disjoint 
and connected by an undirected path. 

Let X r>S)t be the number of pairs of undirected cycles of length r and s respectively 
that are connected by an undirected path of length t. In such a structure the number 
of arcs is r + s +1 while the number of vertices is r + s + 1 — 1. Since |V n | G Z n , we have 
\V^\ — n — |V n | G Xf = [e~ Tk n — n 1//2+<5 , e~ Tk n + n 1 l 2+5 ]. Thus 


EX 


r,s,t 


< ( l ^|) r+s+t-1 


(: 2k) r+s+t 


\ r+s+t 

n J 


< O 



^ 2 ke Tk + 


2k 

n l/2-S 


r+s+i 


Summing over all possible r, .s and t shows that 


ESS • ° 

1 <r<n l<s<n l<t<n 


Tl 


EEE( 2A;e 

Kr l<s l<t 




+ 


2k 

n l/2-<5 


< O 



r+s+t 


which is o(l) since the sum in the brackets converges. 


□ 


3.3.2 The maximum size of spectra 

This section proves part (b) of Theorem 3 (the sizes of spectra outside the giant). 
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Lemma 8. Let £ > 0 be a constant. Then 


sup P 

V„C[n]:|V„|eX„ 

where X k = (k - r k ) (^zt) A 1 

The exploration of D njk [S*\ can be coupled with a colouring process. Initially, colour 
all vertices in V n green, all vertices in yellow, and all arcs white. Then: 

(i) Colour the vertex v black, and colour the k arcs that start from v red. (Red arcs 
start from vertices in S* but their endpoints are not determined yet.) 

(ii) Pick an arbitrary red arc. Choose its endpoint uniformly at random from all the 
n vertices. Colour this arc with the colour of its chosen endpoint vertex. (So a 
yellow arc goes to a vertex that is not already in S*, a black arc goes to a vertex 
that is already in S*.) If the chosen vertex is yellow, colour this vertex black and 
colour all its arcs red. 

(iii) If there are no red arcs left, terminate. Otherwise go to the previous step. 

In the end, S* consists of all black vertices, and arcs that start from vertices in S* have 
one of three colors: green arcs go to V n ; yellow arcs form a spanning tree of T> ri j ; [S*\ 
rooted at v: black arcs connect vertices in S* but they are not part of the yellow spanning 
tree, so they are in cycles in T> Uik [S*\. Figure 2 depicts the colouring process. 


max„ g ye j SI 
logn 


log(l/A fc ) 


> £> = o(l), 


v 



O Green 
• Black 
O Yellow 
♦ Red 


Figure 2: The colouring process. 


We use random variables R t and Y t to track the number of red arcs and yellow vertices 
after the t -th red arc is colored. Thus Ro = k and Yq = |V£| — 1. When a red arc is 
colored, if a yellow vertex is chosen as its endpoint, then the number of red arcs increases 
by (k — 1) and the number of yellow vertices decreases by one. Otherwise the number of 
red arcs decreases by one and the number of yellow vertices remains unchanged. Thus 
for t > 1 , 

t t 

R t = Rt_ ± + kZ t -l = kJ2ti-(t-k), and Y t = Y t _, - £ t = |V“| - 1 - 

i— 1 i= 1 
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where £* are independent Bernoulli Yt/n (the probability that a yellow vertex is chosen). 
Let T = minjf : R t < 0}. Then S* —T/k , since T is the total number arcs that have 
been colored and |<S*| is the total number of vertices that have been colored. 

Let (£t)t>i, be i.i.d. Bernoulli (e -Tfc + n~ 1 ^ 2+5 ). Since Y t /n < \Vf\/n < e~ Tk + n _1 / 2+<5 , 
we have >: where >: denotes stochastically greater than (see [36]). Therefore 

there exists a coupling such that £* > for all t almost surely. Let T t = min{£ : 

— (t — k) < 0}. Then T >T almost surely. (The random variable T is called 
the total progeny of a Galton-Watson process with offspring distribution £i. For an 
introduction to Galton-Watson processes see [13]). It is well know that if < 1, which 
is true in this case, then ET — k/{ 1 — E^) = 0(1). Thus ET = 0(1). 

Proof of the upper bound. Let uj n = [(1 + s) log n/ log(l/Afc)J + 1. Since T > T, 


P {T > ku n } < P {T > kcu n } < P 


E ku n £ 

i =1 G 
ku n 



where k n = kuj n /(u n — 1). Hoeffding [21] showed that 


P 


Bin(m,p) 


m 


> p + x > < 


P 


p + X 


p+x 


p 


p — X 


1—p—x 


where Bin(m,p) denotes a binomial ( m,p ) random variable. Recalling that E^ = e Tk + 
n -i/ 2 +s = i _ Tk jp _|_ 77, 1 / 2+5 an( ] \ k = (k — T k ) (-jPti) k h follows from Hoeffding’s 
inequality that P {T > kco n } is at most 

1 /k n ) 1 /kn) 

— 0(Xf n ) (l + O (n _1//2+5 )) a,n 

= 0(n _(1+e) ). (3) 



(k - r k ) 


Tk 


k -1 


+ 0(n~ 1/2+s ) 




Since k\S*\ = T, by the union bound 

P {iLgyc|«S*| > t o n } < nP {T > A;o; n } = O (n _e ). □ 

Proof of the lower bound. Let if n = [(1 — e) log nj log(l/Afc)]. To show that whp there 
exists an 6 such that |«S*| > ip n , pick an arbitrary yellow vertex and run the colouring 
process. If at least ifn vertices are colored black (success) in the process then terminate. 
Otherwise (failure) pick another yellow vertex and repeat the colouring process until 
one trial succeeds. If the colouring process is repeated for at most t n = [n/(logn) 3 J 
times, then at most a n = t n if n = 0(n/(logn) 2 ) vertices are colored black in the end. 
Therefore, the probability that the number of red arcs increases after colouring one red 
arc is at least (|V £ | — a n )/n. 

Let (£j)j>i be i.i.d. Bernoulli (|V £ | — a n — if n )/n. Let T = min{t : kY^ l= i 6; — (t — 
k ) < 0}. Then in each of the first t n iterations, the probability of a success is at least 
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P {T > kijj n } > P {T = k'lpn}. (For a detailed proof, see van der Hofstad’s discussion of 
the Erdos-Renyi model [39, chap. 4.2.2].) By the hitting-time theorem of Galton-Watson 
processes [41], 

C k'lpn 

P {T — kt/j-n} = —plkJ2k= k ^n ~ 1) 

T n l i= 1 

Since Yli=i & i s a binomial random variable, the above equals 



}_( klpn \ r m -an-^ y ^ 1 A _ \V^ - an - 
Yn \Yn 1 / \ Tt J \ ^ 

By Stirling’s approximation [17, pp. 407] 


k'tpn \ 1 [ k 

y>«-v 1 Uni e(w^) L(i-■ 

Recalling that a n = O (n/(\ogn) 2 ) and ip n = [(1 — e) logn/log(l/Afc)], we have, in view 
of |V^| = e~ Tk n + O (n 1//2+<5 ), 


M - An - A 
n 


'Ipn 1 


= e~ Tk - O 


(log n) 


4>n- 1 


= 0 (e~ T ^ n ) , 


and 


|V C | — a — ib \ fc W-(W-i) 

1 - 1 nl n ] = [ 1 - e- Tfe + O 


n 


(log n) 


klpn-{.i>n- 1 ) 


, /Ti,\( fc_1 )U 
= 0( ( ¥ 


Recall that e Tk = 1 — Tk/k. Therefore 

fc-i 


Afc = (k- T fc ) 


Tfc 


= ke 


~Tk 


Tk 


k — 1) " V A; — 1 

Putting everything together, we have 


fc-i 


k 


(1 - 1/A) 


k -1 


_ r , / 7/c 
e fe — 


fe-i 


b n = 0 


1 1 

i>n \fWn 


k 


(1 - 1/A) 


e -r fc / U 

fc-U l 


k -1 


= 0 


u 

U 2 


= e 


n 


—1+£ 


/ 3/2 
Wn 


So the probability that all the first t n = |yi/(logn) 3 J trials fail is at most 


(1 - b n ) tn < exp {—b n t n } = exp 0 


n 


(logn) 9 / 2 


= o(l). 


□ 


16 



By Lemma 2, whp Q n is reachable from all vertices. When this happens, O n \ Q n 
consists of vertices either on cycles in T> n j. [Q^] or on paths from these cycles to Q n . Since 
the number of such cycles and the length of the longest one of them are both O p { 1), 
Lemma 8 implies that \O n \ — Q n = O p (logn). Thus 

| Qn | - Vkn _ \On\ - VkU c / logw \ d 

\fn \fn p \ \fn ) 

which is the second part of Theorem 1. 

In fact we can show that \O n \ — \Q n \ = O p ( 1). This seems to be obvious since in 
V n k [V£] the expected size of a spectrum is 0(1) and the number of cycles is O p ( 1). 
However, it is not trivial because t[ v i S on a cycle] and \S*\ are not independent. For a 
proof using Cayley’s formula, see Lemma 9 in the next section (Section 3.3.3). 

We can also use Lemma 8 to show that 

max„ 6 [ n ] Q n p ^ 1 

log n log(l/A fc ) ’ 

which finishes the last part of Theorem 1, i.e., (max„ g [„] |«S„| — Vkn) / (Jky/n A- Z. Let A n 
be the event that every vertex can reach Q n . Assuming A n happens, Q n C S v for all 
v G [n\. Thus for all £ > 0, 

ip j | max„ e [ n ] S v Q rj 1 

logn log(l/A fc ) 

max, e[n ] 1 _ 1 

logn log(l/A fc ) 

Since |<Si| < max,, e [ n ] S v and whp <S[| > Q n \, we also recover Grusho’s central limit 
law of |(Si|. 



> £ 


n A n > + P {A c n } — o(l). 


3.3.3 The size of the middle layer 

Lemma 9 and Corollary 1 imply that \O n \ — \Q n \ = O p (l). 

Lemma 9. Let u> n —> oo be an arbitrary sequence of nonnegative numbers. Then 


sup P 

V„C[n]:\V n \eX n 


Y, l‘5;|>o;n^o( 1 ), 
vec(vz) 


where C(Vf) denotes the set of vertices on cycles in T) n k \Vtf\, and S* is the spectrum of 
v in T) n k [V r (], the sub-digraph induced by V r n . 


Proof. By Theorem 5 and Lemma 7, in T> n . k [V)(] whp: (a) there are at most ■sfuTn vertices 
on cycles, i.e., |C(V)()| < (b) every S* induces either a tree or a tree plus one extra 

arc; (c) ma x ve gc |<S*| = O(log n). Now assume all these events happen. If^ veC (v=) |5*| > 
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Figure 3: The leftmost shaded part of this figure is an £-e ye. 


u n , then (a) implies there is at least one vertex u G C(V^) with S* > By (b), 

S* induces a sub-digraph that consists of exactly one cycle and isolated trees with their 
roots on this cycle. If |<S*| = £, we call the induced sub-digraph an Aeye. Note that by 
(c) there are no Aeyes with £ > (logn) 2 . 

Let S C with |<5| = £ be a set of vertices. If S induces an £-eye V e , then there are 
£ arcs that start and end at specific vertices in S decided by V e , which happens with 
probability (l/n)b If S = S* for some vertex u G S, call S a partial spectrum. For S to 
be a partial spectrum, the other (k — l)£ arcs that start from S must end at V n , which 
happens with probability (|V n |/n)^ -1 ^. So the probability that S induces a fixed V e 
and S is a partial spectrum is (l/n)^(|V„|/n) 

By Cayley’s formula [7], there are ways that S can form a rooted tree. In such 
a tree, there are at most £ 2 ways to add an extra arc to make it an £-eye. In a vertex- 
labeled £-eye, there are at most k l ways to label the arcs. So the number of Oeyes can 
be induced by S is less than £ e ~ 1 £ 2 k e . And there are (^) ways to choose S. 

Let X, be the number of 7-eyes induced by partial spectra. Recall that v k = r k /k = 
1 — e~ Tk . Thus |V n | Gl„ = lh / k n — n 1 / 2+s ,i , kn-\-n 1 / 2+s ] implies that \V^\ < e _Tfe n + n 1//2+<5 . 
So for £ < (logn) 2 , by the above arguments, 


EX, < 


< 


IV, 


n\ \ /)t—\ p2 j^t | ^ 

n 


|V„ 


n 


(k-l)i 


(e Tk n + n l / 2+5 Y e+1 e f 1V fr k 


* ; (v + "- 1/2+i ) 


(fc-i)t 


(£/eY 

e (e~ Tk + n- l ' 2+s ) k + n'^)^ 1 
= (1 + 0 (£n~ 1/2+s )) (ke 1 ^ l 




= (1 + 0 pie. 

By Lemma Al, p k < 1. Since XuXn —> oo, 


X] EX, < 

\/+t<t<(log n ) 2 


1 + 0 


( (log 


n 


V n 1 / 2 


—S 


£ W = o(i). 
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Thus whp there are no £-eyes induced by partial spectra with i e (logn) 2 ]. □ 


3.3.4 The distance to the giant 

This subsection proves part (c) of Theorem 3. 

Lemma 10. For all e > 0, 

> ej = o(l), 


sup 

V„C[nl:|V n |eX n 


P 


maXygyc W* 
log k log n 


where W* = min ne v„ dist(u,u), i.e., IT* is the length of the shortest path from v to V„. 

Let v e be a vertex. If IT* > 1, then all neighbors of v are in V“, and most 
likely there are k of them. So P {IT* > 1} ft (|V^|/n) fc ft e~ Tkk . If IT* > 2, then 
the neighbors of v’s neighbors are all in V£, and most likely there are k 2 of them. 
So P {IT* >2} « {\Vf\/n) k+k ~ ft e -r k (k+k )_ Repeating this argument shows that 
P {IT* > x} ft exp{—rfc(/c + k 2 ... k x )} = e~ Tk& ^ kx \ which is o(l/n) when x > (1 + 
e) lo Sfc log n. 

To make the above intuition rigorous, the colouring process defined in the previous 
subsection needs to be slightly modified. Let v be the vertex where the process has 
started. When choosing a red arc to colour, instead of choosing one arbitrarily from all 
red arcs, choose one arbitrarily from those that are closest to v. Thus at the end, the 
yellow arcs consist of not just a spanning tree but a breadth-first-search (bfs) spanning 
tree of D n ^[Sf\. If V n (the set of green vertices) is contracted into a single green vertex, 
then the green arcs together with yellow arcs form a DAG. Let T v denote this DAG. Then 
W* is the length of the shortest path from v to the green vertex contracted from V n . 
Figure 4 shows an example of T v . 



O Green 
• Black 
Yellow 


Figure 4: An example of T v . 


Proof. Let c o n = [(1 + e) log fc log nj. Call the arcs whose endpoints are at distance 

i to v the i-th layer of %. The event IT* > c o n implies that the first u n layers of 

arcs in T v are all yellow arcs and thus they form a tree of height u n . By Lemma 7, 

whp there are no v E Vf such that T> n ^[Sf] contains more than one black arc. Thus 
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whp in every T v all internal (non-leaf) vertices except at most one have out degree k. 
Let A n denote this event. Assuming A n happens, W* > oj n implies that there are at 
least Q(k Un ) = @(logn) 1+£ yellow arcs in the first u n layers of T v . Thus in the colouring 
process, the first @(logn) 1+£ arcs choose their endpoints in V, £ . The probability that this 
happens is at most (|V £ |/n) 0(logn ) 1+ \ Since |V n | G Z n , |V £ | = n — |V n | < e~ Tk n + n 1//2+<5 . 
Then by the union bound, 

p{u veV c[w: >co n }} < y p{[w:>uj n \nA n }+p{A c n } 

v£V% 

< n(|V, c J/n) 0(logn)1+e +o(l) 

< n(e~ Tk + n -V 2 +5)©(iog») 1+£ + 0 (i) = 0 (i). 


Thus whp max„ e v,c W* < uj n . 

Let ipn — [(1 — e) log fc log n]. To show that whp there is a vertex v with W* > i/; n , 
run the colouring process starting from an arbitrary yellow vertex v until either an arc 
is colored black or green (failure), or the first ijj n — 1 layers of T v are colored yellow 
(success). So to succeed, the first — 1 layers of T v form a full k -ary tree, i.e., the 
first k + k 2 + • • • + /A ”"- 1 = Q(kA n ) = @(logn) 1_£ arcs must be colored yellow. If the 
process fails, we pick another yellow vertex and try again until one trial succeeds. Since 
the colouring process stops before colouring the layer of %, each trial colors at most 
(~)(k^ n ) = @(log n) 1_£ vertices black. If the process is tried at most \nf (log n) 2 ] times, 
then at most b n = [n/(logn) 2 ] 0 (logn) 1_£ = 0 (n/(logn) 1+£ ) vertices are colored black. 
Therefore, each arc has probability at least (|V £ | — b n )/n to be colored yellow during 
the first [n/(logn) 2 ] trials. Since |V n | G Z n , |V £ | =n — |V n | > e~ Tk n — n l ^ 2+& . Thus the 
probability to succeed in one trial is at least 


ra - h 

n 


O(logji ) 1 e 

> 



1 

(logn) 1+£ 


O(logn ) 1 £ 

= e _ °( lo g n ) 1_s 


Therefore, the probability that the first |"n/(logn) 2 ] trials fail is at most 


^l _ g-o(iog^) 1 £ j 
Thus whp max„ e v,c W* > ip n 


i_ £ \ fn/(log-n.) 2 l 


< exp _ e -odogn) 


1 —e n 


(log n) 


= o(l). 


□ 


3.3.5 The longest path outside the giant 

This subsection proves (d) and (e) of Theorem 3. 
Lemma 11. For all e > 0, we have: 

sup P 

V„C[n]:|V„|eZ„ 


m{y c n ) _ 1 

log n log (e Tk /k) 


> £> = o(l), 
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where m(V £ ) denotes the length of the longest path in D n ^[Vf]; and 

sup P 

V„C[n]: \V„\eZ n 

where dfVff) denotes the maximal distance between two connected vertices in D n ^\Vf\. 

Since m(V £ ) > d(Vff), it suffices to prove the upper bound for m(V £ ) and the lower 
bound for dfVff). 

Proof of the upper bound. Let uj n — (1 + e) log n/log(e Tfc //c). Let Xg be the number 
of labeled paths of length l in V n .k [V/]. There are less than |V £ | t+1 k l possible such 
paths. Each of them exists with probability (1 /n) . Recall that |V n | G l n implies 
|V £ | < e~ Tk n + n l / 2+s . Thus 

EX e < \Vf\ e+1 k e < (e~ Tk n + n 1/2+s ) ( ke~ Tk + kn~ 1/2+s ) e . 

Since ke~ Tk < 1 (Lemma Al), for n large enough, 

^ EX e <nJ2 ( ke~ Tk + kn~ 1/2+5 Y = O (n (ke~ Tk y n ) = O (n" e ). 

u n <e<\vz\ oj n <e 

Thus P (m(V £ ) > t o n } = O ( n ~ £ ). □ 

Proof of the lower bound. Let = |"(1 — e)logn/log(l//ce _Tfc )]. To show there are two 
vertices at distance within [if n , oo), pick an arbitrary yellow vertex v and run the colour¬ 
ing process until either a vertex at distance tfn from v has been colored (success), or 
[(logn) 2 ] vertices have been colored (failure), or the process terminates because all ver¬ 
tices that are reachable from v in T> n) k[Vf] has been discovered (failure). If the process 
fails, we pick another yellow vertex and try again until one trial succeeds. 

If at most t n = [ri/ (logn) 4 J trials are made, then at most [(log n) 2 ] t n = O (n /(log n) 2 ) 
vertices are colored. So in the first t n trials, when an arc is colored, the probability that 
it is colored yellow is at least p, n = (|V £ | — O (n/(logn) 2 ))/n = e~ Tk —O (l/(logn) 2 ). Let 
(Z m ) m > o be a Galton-Watson process with offspring distribution Bin (k,fi n ) and Z 0 = 1. 
In other words, Z m+ 1 = Yfj=i X m,j, where {X m _ ,j)m>o,j>i are i-i.d. Bin (k,p n ). Then the 
probability that one trial succeeds is at least P {Z^ n >0} minus the probability that in 
a trial [(logn) 2 ] vertices have been colored, which is O (n~ 1_£ ) by (3) in Lemma 8 . 

Let < p m {y) = Ey Zm , i.e., ( p m (y) is the probability generating function of Z m . Thus 
P {Z m = 0} = (p m ( 0). Since ke~ Tk < 1/2 (Lemma Al), for n large enough kp n < 1/2. 
So we can apply Lemma AT in the appendix to show that 

1 - ( k Pn) m+1 < 1 - ^{k/j, n ) m , for all m > 0 . 

Recalling that if n = [(1 — e)logn/log(l//ce _Tfe )], 

P {Z*„ > 0} = 1 - > \ {ke-- - O “ = np- 1+ '). 


T , m(0) < 1 — (kp n ) m + 


d(V, 


logn log (e Tk /k) 


>£> = o(l). 
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So the probability that one trial succeeds is fl(n _1+£ ) — O (n _1_£ ) = fl(n _1+£ ). (The 
O (n _1 ~ £ ) term is the probability that one trial colors too many vertices.) Thus the 
probability that the first t n = \yif (log n) 4 J trials fail is at most 


(l — fl(n _1+£ ))*" < exp 



f 1 

n 

Vn 1_£ 

(log n) 4 



o(l). 


Therefore whp d(V £ ) > if n . 


□ 


4 Phase transition in strong connectivity 

Now instead of assuming that k is fixed, let k —> oo as n —> oo. Let K be a fixed integer. 
We can construct V n ^ by first generating V n ,K and then adding arcs with labels in 
{K + 1,..., k} into it. By Lemma 2, for all £ > 0, there exists a K depending only on e 
such that whp in V n K the largest closed SCC has size at least (1 — e)n and is reachable 
from all vertices. Since adding arcs can only increase the size of this SCC, whp D nk has 
a SCC of size at least (1 — e)n that is reachable from all vertices. 

In fact, if k increases fast enough, then whp T) n _ k is strongly connected. More precisely, 
T> n k exhibits a phase transition for strong connectivity similar to the analogous event 
in the Erdos-Renyi model [15]. 

Theorem 6. If k — log n —> —oo, then whp V nk is not strongly connected. If k — log n —> 
oo, then whp V nk is strongly connected. 

If there is a vertex with in-degree zero, then obviously the digraph is not strongly 
connected. Thus the following lemma proves the lower bound in Theorem 6. 

Lemma 12. If k — log n —» — oo, whp T> n k contains a vertex of in-degree zero. 

Proof. Let ui n = log n — k. For vertex i e [n], let X, be the indicator that i has in-degree 

zero. Let N = X t . We use second moment method to show that N > 1 whp. 

To have X\ = 1, nk arcs need to avoid vertex 1 as their endpoints. Thus 

( i \ n k / LJ n \ 1+1/ft- 

X _ _ J > e ~ n k(l/n+l/n 2 ) _ e -fe(l+l/n) _ ( e __ \ 

Since by assumption u n —> oo, EW = nEXi = e u ^ l+1 l n )f n l l n oo . 

To have X\X 2 = 1, nk arcs need to avoid vertices 1 and 2 as their endpoints. Thus 
EX,X 2 = (1 - 2 /n) nk . Therefore 

E [XiX 2 ] _ (1 - 2/n) nfc _ / n 2 — 2n \ nk _ f 1 \ nk , 

(E [Ad]) 2 “ (1 - l/n) 2nk ~ \n 2 - 2 n + 1 ) ~ V ~ (n — l) 2 / ^ ’ 

since nkj{n — l) 2 = o(l). Thus 

^ E[iV 2 ] _ EN + n{n — 1)E \XiXf\ ^ 1 E [XiX 2 ] 

~ (EAt) 2 (EiV) 2 - EiV + (EAR) 2 

Therefore P {N = 0} < Vor (N) /(EIV) 2 = E [At 2 ] /(EAt) 2 -1^0. □ 
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Given a set of vertices S, if there are no arcs that start from S c = [n] \ S and end 
at S, then call S a non-leaf. If V n ^ is not strongly connected, then there must exist 
a non-leaf set of vertices S with |5| < n. Thus the following lemma implies the upper 
bound in Theorem 6. 


Lemma 13. If k — log n —> +oo, whp there does not exist a non-leaf set of vertices S 
with 151 < n. 


Proof. By the argument at the beginning of this subsection, whp T> nk contains a SCC 
of size at least nj 2 that is reachable form all vertices. So if |5| > nj 2, then S contains 
part of this SCC and cannot be a non-leaf. Thus it suffices to prove the lemma for S 
with |5| < n/2. 

Let oj n = k — logn. For s G [|_n/2_|], let X s be the number of non-leaf sets of vertices 
of size s. Thus 


S \ k(n-s) 
1 -- < 
n / 


EX = 

Therefore for s < n/logn, 

EX, < ^ e -^d-Vn) < 1 f n 

s ! — s! Ve fc d-«/ n ) 


n 


* 1 

< — 


/cs(l— s/n) 


n 


(4) 


s! y(ne“") 1_1 / logn 


s! 


By assumption c o n — > oo. Thus a n = n 1//logn /e aj ”^ 1 1 / log G = e 1 1 / logn ) = o(l). 
Therefore, 

E EA '- s E f = <=“- - 1 = »(i)- 

l<s<n/logn l<s 


On the other hand, it follows from (4) that for n/logn < s < n/2, 




< 


en 


ggk(l—s/n) J l _J2_gk/ 2 

' log n ° 


e log n 
[ ne^") 1 / 2 


= 3 s . 


Since = elog n/fne^") 1 / 2 = o(l), 


X EX,< £« = 0 (A0 = 0(1). 

n/logn<s<n/2 


l<s 


Thus P 


{Xa<s<n/2^s E l| E El<s<n/2 — °(1)- 


□ 


5 The simple digraph model, the number of self-loops 
and multiple arcs 

A simple digraph is one in which there are no self-loops and there is no more than one 
arc from one vertex to another. Let T>* n k denote a simple /c-out digraph with n vertices 
chosen uniformly at random from all such digraphs. Z>* k can be viewed as T> n ,k restricted 
to the event that 'V n .k is simple. This section proves the following theorem: 
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Theorem 7. The probability that V n ^ is simple converges to e k w as n —» oo. 


Theorem 7 can be proved directly as follows. Let t v be the indicator that the k arcs 
starting from vertex v do not end at v and do not end at the same vertex. Then 


^ r t ( n — 1) (n — 2) • • • (n — k ) 

P {i v = 1} = ±^^= 1 


W' 


k(k + 1) 
2 n 


+ o - . 


n- 


Since T> n k is simple if and only if fl" =1 [l„ = 1] happens, we have 

n 

P {V ri)k is simple} = P (fl” =1 [t v = 1]} = P {t v = 1} 


= 1 


k(k + 1) 
2 n 


V=1 

1 

ri‘ 


+ 01 — 


—> e 


-fc(fc+l)/2 = 


However, we can say more about self-loops and multiple arcs between vertices. Let 
X = [n] x [k]. For (v,i) G X, define the random variable 1 Vji to be the indicator that 
the arc with label i starting from vertex v forms a self-loop. Let J = {(v,i,j) G 
[n] x [k] x [k] : i < j}. For ( v,i,j ) G J , define the random variable 1 v ,i,j to be the 
indicator that the two arcs starting from vertex v with labels i and j both end at the 
same vertex. Let S n = ^} QgI l a and M n = Then [S n = 0] D [M n = 0] if and 

only if T> n k is simple. 

Lemma 14. Let S and M be two independent Poisson random variables of means k 
and ( 2 ) respectively. Then ( S n ,M n ) -4 (. S,M) as n —> 00 . In fact, 

||(S„,M„),(S,M)|| TV = o(i). 

Indeed the lemma implies that as n —> 00 , 

P {V U)k is simple} = P {S n = M n = 0} -)• P {S = 0} P {M = 0} = e - k e~^. 

Remark. Bollobas [9] proved a theorem similar to Lemma 14 for the configuration 
model (see also Bollobas [ 8 , sec. 2.4]). Many authors have extended this result under 
various conditions, see, e.g., McKay [30], McKay and Wormald [31], Janson [23, 24], Our 
proof uses Stein’s method, which may also be applied to self-loops and multiple edges 
in the configuration model to get proofs shorter than previous ones. 

Proof of Lemma If. We use the Chen-Stein method [11]. Since the probability that an 
arc forms a self-loop is 1 /n, 




E Ki 

(■ v,i)£l 



n 


k. 
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Thus E S = k = E S n . Since the probability that two arcs with the same start point have 
the same endpoint is also 1 /n, 

E M n = y, y El w>i j = n 

v&[n\ 1 <i<j<k 

Thus EM — k(k — l)/2 = EM n . 

For o; G X U J, let 

S a = {,0 6 1 U J : 1 ^ and l a are dependent}. 

(Note that t a G £I Q .) Define 

bi= y y e [i a ] e [i p ], b 2 = y y e [i a i^], & 3 = y 

aGXUJ7 /3EB a q;GXU^ pEBoc’-oc^P 

where 

= E |E [l a | a (1,9 : f3 G [IU J] \ £ a )] - El a | . 

By [11, thm. 2], if 6 i + b 2 + &3 —> 0, then (S'n, M n ) -A (S', M ). Since l a is independent of 
the random variables 1/3 with f3 G [X U ,X] \ £> a , we have s a = 0 and thus 63 = 0. 

For (w, i ) G X, l^j depends on the random variables t v . r , s with 1 < r < s < k 
and i G (r, s}, of which there are k — 1. Thus \B Vji \ — 1 + (k — 1) — k < 2k. For 
(v,i,j) G J. 1 Vji j depends on t Vji and t v j. It also depends on the random variables 
l v , r ,s with 1 < r < s < k and (r, s} D {i, j} ^ 0, of which there are 2 (k — 1) — 1 = 2k — 3. 
Thus B v ,ij = 2 + 2k — 3 < 2k. So for all a G X U J, \B a \ < 2k. Therefore 

E [1 Q ] E [1/3] + EE E [l a ] E [l/j] 

q;GX /3gBq. P^B a 

1 1 fk\ 1 1 

< nk x 2k x - x - + n[ )x2kx-x- = 0 

n n \2 J n n 

Consider ( v,i ) G X. If f3 G B V>1 flX, then (3 = ( v,i ). If /3 G B Vj i D JE, then (3 = (v,r,s) 
for some (r, s) with f G (r, s}. Then l„,il?;,r,s = 1 if and only if the two arcs starting from 
vertex v labeled r and s respectively both end at v. Thus E [l^jl^,,] = 1/n 2 . Therefore 

b- 2,1 = y y E [l a lp] = y y E [ 1 «M < nk x 2k x = O 

a£l a£X /3£B a C\J' 

Consider (v,r,s) G J. If ( v,i ) G B v ^ s , then (v,r, s) G B Vji . Thus by the above 
argument E[l ViS lJ = 1/n 2 . If (v,i,j) G B v ^ s and (i, j) ^ (r,s), then |{r, s}U{i, j}\ = 
3. So 1 v , r ,s^-v,i,j = 1 iff the three arcs starting from vertex v with labels in (r, s} U {i, j} 
all end at the same vertex. Thus E {t V:ryS t v ^j] = 1/n 2 . Therefore 

b^J = y y E[l a l/ 3 ]<n^) x2/cx i = 0 (“)- 

a£j p&Bc-.p^a E / \ / 

Thus b 2 = b 2) x + b 2} j = 0(l/n). □ 




kk\ 1 _ k(k- 1 ) 
V 2 J n “ 2 
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Corollary 2. Let £ be a set of digraphs. IfV n ,k € £ whp, then V* n k e £ whp. 

Proof. We have 

= f {V,, t £ I V n , t is simple} < D 

This corollary implies that all previous results in the form of “whp T> n ^ ...” can be 
automatic translated into “whp T>* n k ... ”. For example, the statement of Theorem 3 
with V njk replaced by V* k is still true. 

Corollary 3. Let V*f k be a digraph chosen uniformly at random from all simple and 
arc-unlabeled k-out digraphs with n vertices. If whp V n ,k has property P where P does 
not depend on arc-labels, then whp P*f k has property P. 

Proof. Note that: (a) for each digraph in the space of P** k , there {k\) n ways to arc-label 
it to get ( k\) n different digraphs in the space of D* fc ; (b) no two different arc-unlabeled 
digraphs can be turned into the same digraph by arc-labeling. So there exists a {k\) n - 
to-one surjective mapping from the space of V* k to the space of V** k . Thus V** k can 
be viewed as Vf k with arc labels removed. Since P does not depend on arc-labels, it 
follows from Corollary 2 that whp T>*f k has property P. □ 


6 The typical distance 


The typical distance H n of 'D n . k is the distance between two vertices v\ and chosen 
uniformly at random. If v\ cannot reach u 2 , then H n = oo. Addario-Berry et al. [1] 
proved that conditioned on H n < oo, H n /\og k n A 1. This section 1 gives an alternative 
proof using the path counting technique invented by van der Hofstad [40, chap. 3.5]. 

Theorem 8 (The typical distance). For all £ > 0, 


P 


Hr, 


lo Sfc n 


> £ 


H n < OO 


0 ( 1 ). 


By Theorem 1, \S vl \/n A- v k , where S V1 is the spectrum of v\. Thus P { H n < oo} = 
P { V r 2 G —>• v k > 0. Therefore 


P{H n <{! 


£) lo Sfc n 


H n < oo} = 


r^j 


P {H n < (1 - e) logfc n} 

P {H n < oo} 

— P {H n < (1 -e)log k n}, 


a shorter version of this paper, this section is omitted. 


26 



and 


P {H n > (1 + e) log fc n \ H n < 00 } = 


r^j 


P {(1 + g) log fc n < H n < 00 } 

P {H n < 00 } 

— P {(1 + e) log fc n < H n < cx)} 
Vk 


P{B n } 

Vk 


Thus it suffices to show that P { H n < (1 — e) log fc n} and P {B n } are both o(l). 
Lemma 15 (Lower bound of the typical distance). For all £ > 0, 

P {H n < (1 - e) log fc n} = o(l). 


Proof. Let Ng denote the number of paths from V\ to V2 of length i. Consider such a 
path without labels on internal vertices and arcs. There are at most rr 1 ways to label 
its internal vertices and there are at most kf ways to label its arcs. And the probability 
that such a labeled path appears is (1 /n) £ . Thus 


E Ng < vf-'kf 


k e 

n 


Let oj n = (1 — e) log k n. Then 


v-^ Ar kf Oik?”) 
} ENg < } — = —-- 


O (■n 1 £ ) 
n 


o(l). 


Thus P {H n < uj n } = P {Y!i<u n N z > !} = o(l). □ 

The rest of this section is organized as follows: Subsection 6.1 shows that if v\ can 
reach V 2 but only through a very long path, then it is very likely that V\ can reach a lot 
of vertices and a lot of vertices can reach n 2 - Subsection 6.2 computes a lower bound of 
the probability that there is a path of specific length from one large set of vertices to 
another large set of vertices. Finally, subsection 6.3 shows that these results together 
imply the upper bound in Theorem 8, i.e., P {B n } = o( 1). 


6.1 Comparison to Galton-Watson processes 

Let <S+(v) and S m (v) be the sets of vertices at distance exactly m from or to vertex v 
respectively. Let Sf m (v) and Sf m (v) be the sets of vertices at distance at most m from 
or to vertex v respectively. The following proposition shows that for fixed m, we can 
perfectly couple (|«S t + (u 1 )|, |d> t “(u 2 )|)JT 0 with two independent Galton-Watson processes. 
It is inspired by a similar result of the configuration model by van der Hofstad [40, sec. 
5.2], but the coupling method used here is new. 

Proposition 1. Let (S t ) t >o be a Galton-Watson process with a binomial offspring dis¬ 
tribution Bin (kn, 1 /n). For all fixed m > 1, there exists a coupling 
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of (tf, St)™ 0 and (|5 t + (vi)|, \S t (v 2 )|)£l 07 such that 

P{(k\Y t )Z 0 ^(Y t \Y t ~)l 0 }=o(l). 

Proof. We construct an incremental sequence of random digraphs, denoted by {T>^ k )t> o, 

through a signal spreading process. Let T>^\ be a digraph of vertex set [n] that has no 
arcs. Without loss of generality, let v\ = 1 and v 2 = 2. At time 0, put a © signal at v\ 
and put a © signal at v 2 . 

If a © signal reaches a vertex v at time t, then at time t + 1/3 the vertex v grows k 
out-arcs labeled 1 ,,k from itself and to k endpoints chosen independently and uar 
from all the n vertices. Then the © signal splits into k © signals and each of them 
picks a different newly-grown out-arc and travels along the arc’s direction to reach its 
endpoint at time t + 1. 

If a © signal reaches a vertex v at time t, then at time t + 2/3 the vertex v grows a 
random number X in-arcs from itself to X random vertices as follows: Let (Xjj)j e r n i j e rw 
be i.i.d. Bernoulli 1 fn random variables. If X tJ = 1, then v grows an in-arc from itself 
to vertex i with label j. Thus in total X = X^iefnl je[k] in-arcs are grown from v. 
Then the © signal splits into X © signals and each of them picks a different newly-grown 
in-arc and travels against the arc’s direction to reach its starting vertex at time t +1. If 
X = 0, then the © signal vanishes. 

Let T >\be the digraph generated in the above process at time t. Let yf and yf be 
the sets of vertices that are visited by © and © signals at time t respectively. Let yf t 
and yf t be the sets of vertices that have been visited by © and © signals before time 
t + 1 respectively. At time t, if a signal visits a vertex in [T< 4 -i U JXt-i] or if f wo signals 
visit the same vertex, then we say a collision happens. Let T be the first time when a 
collision happens. 

Table 1 lists the types of events that make a collision happen. Three of them need 
special attention for reasons to be clear soon. First, if multiple © signals visit the same 
vertex v, then multiple arcs with the same label and v as the starting point may grow. 
If this happens we pick an arbitrary arc among them and call the others duplicate. 
Second, a © signal may visit a vertex in Xct-i through a newly-grown out-arc. Finally, 
a © signal may visit a vertex in yf T1 through a newly-grown in-arc. We also call the 
newly-grown arcs being passed by in these two cases biased. 


Signals visit the same vertex 

Signals visit T< t _i 

Signals visit y <t _ 1 

©_^o^-© 

©--•—© 

©—•♦-© 



^ -A; 



Table 1: Events that lead to a collision. Three special types of events are marked. 


We construct a random k -out graph T> n k as follows: First remove all duplicate and 
all biased arcs in T> l n k . Then for each pair (v,i) G [n] x [k], if vertex v does not have an 
out-arc labeled i, then add such an out-arc with its endpoint chosen uar from [n]\y< T _ v 
Denote the result digraph by T> n ,k- 
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The seemingly complicated V n ^ is nothing but T> n . k in disguise. In T) n the endpoints 
of the arcs are chosen uar and simultaneously. In T> n kl the endpoints of the arcs are still 
chosen uar but in several steps. First we mark the arcs whose end (start) vertices are 
at distance t to v\ (from v 2 ) for t — 1,... ,T. To have T> nk — T> n _ k , obviously duplicate 
arcs must be removed. The biased arcs also cause trouble as their endpoints are chosen 
non-uniformly. For example, if at time T a © signal visits a vertex in y< T _ 1 , then an 
in-arc is added to a vertex whose in-arcs have already been decided by time T — 1. Thus 
biased arcs must also be removed. Finally, we add arcs that are still missing in D n k 
and choose their endpoints uar from [n] \ 3^<t-d he., from these vertices whose in-arcs 

have not yet been marked. Thus we have 'D n ^ = 'D nik . Let Y t + and Y t ~ be the number 
of vertices in D nk at distance t from v\ and to v 2 respectively. Then 

(r, + vr)£o=(l‘S« + («i)i.|5rwi)£„. 

A © signal always splits into k © signals after it arrives at a vertex. Thus at a 
non-negative integer time t there are in total kf © signals. On the other hand, the 
number of © signals at time t, denoted by Y t . is random. Each time a © signal splits, it 
splits into Bin (kn, 1 /n) signals. Because the splits are mutually independent, (Y t ) t > o has 
the same distribution as {S t ) t > 0, the Galton-Watson process with offspring distribution 
Bin (kn, 1 /n). 

Assume that T > m. Then the part of D n k within distance m from V\ or to v 2 is 
determined by Pff' k . Thus for t < m, in T> n , k a vertex is at distance t from v\ if and only 
if it has a © signal at time t and a vertex is at distance t to v 2 if and only if it has a 
© signal at time time t. This implies that (k f , Y t )™ =Q = (Y t + , Thus to finish the 

proof, it suffices to show the following lemma: 

Lemma 16. For all fixed integers m > 1, whp T > m. 

The intuition is that since m is fixed, for t < m, most likely 1LJ | is small. Thus 
it is unlikely that a collision happens at time t + 1. See the end of this subsection for a 
detailed proof. □ 

Corollary 4. Let u n —>■ oo be an arbitrary sequence. Let M, 5, e be three arbitrary 
positive numbers. Let if n = [(1 + e) log k nj. Let 

A n (M,m ) = \M < |5+(ui)|] n [M < |5“(u 2 )|] n [|«S< m (u 2 )| <u n ] . 

Then there exists m > 1 such that 

limsup P m) fl [if n < H n < oo]} < 5. 

n—>• oo 

Proof. Let (k*, Y t )^f 0 be the coupling of (| Sf (ui) |, \Sf (v 2 )\) r fL 0 constructed in Proposition 
1. Thus (Y t ) t > o is a Galton-Watson process with Bin(/cn, 1/n) offspring distribution, 
i.e., Y 0 = 1 and Y t = Yli=i Xt,i for t > 1, where X t /s are i.i.d. Bin(fcn, 1/n). Since 
EXip = k > 1, the survival probability of this process is a constant r/ > 0 (see [39, 
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thm. 3.1]). For the same reason, Y t /k 1 almost surely for some random variable Y^ 

(see [39, thm. 3.9]). Since E [Xj),] < oo, by the Kesten-Stigum Theorem [39, thm. 3.10], 
P {Foo > 0} = rj. Thus by the Bounded Convergence Theorem [13, thm. 1.5.3], 

lim P {Y m > M} = lim P > ^-) = P {Y^ > 0} = rj. 

m—>oo m—>oo I fy ,n fc rrL I 

For the same reason P {Y m > 1} — > rj as m —> oo. Thus 


lim P {1 < Y m < M} 

m—>• oo 


lim (P {Y m > 1} — P {Y m > M}) 

mM-oo 


o. 


Thus we can choose m large enough such that P {1 < Y m < M} < 8/2 and that k m > M. 

Recall that B n = [i/ n < H n < oo]. When n is large enough, > m. Thus B n implies 
that |5^(ui)| > 1. Dehne the event 


C„-[(Vl5)L = (l5, + WU5,-WI)rj- 

By Proposition 1, P {C/} = o(l) as n —> oo. Therefore 


P {A n (M, m) c n B n } < P { Cl) + P {A n (M, m) c n C n n B n } 

< o(l) + P I [k m < M] U [1 < Y m < M] U 


t =0 


< o(l) + P {k m < M} + P {1 < Y m < M) + P <j u n < Yt 
= o(l) + 0 + 8/2 + o(l), 


i=0 


where the last equality is due to our choice of m and that E EHo = ^ = 

0 ( 1 ). ^ □ 

Proof of Lemma 16. Recall that y/f and yf are the sets of vertices that are reached at 
time t by a © signal or © signal respectively. Let AL m _i = U^q 1 [34 + U 3^”]. Dehne event 
A m = Oljg [ 4 }E m: i where E m /s are dehned as follows: 

• E m \ — The out-arcs that grow from vertices in i all end at different vertices 
in [n] \ M m - 1 - Thus at time m all © signals visit different vertices and these 
vertices have never been visited by signals before. 

• E m 2 — There are no in-arcs that grow from vertices in 1 that have starting 
vertices in M m -\ U y/^. Thus at time m all © signals visit vertices that have never 
been visited by signals before and that are not reached by © signals at time m. 

• E m ,3 — There are no two in-arcs that grow from vertices in T m _i that have the 
same starting vertex. Thus at time m all © signals reach different vertices. 

• E m , 4 — \y~\ < (log n) m . 


30 



The event A t implies that no collision happens at time t. Thus implies that 

no collision has happened by time m, and thus T > m. We show by induction that 
PMvVt = 1 - 0 ( 1 ). 

Since |3^ 0 | = 1 and there are no arc-growing before time 0, P {v4 0 } = 1, which is the 
induction basis. Now assume that P {n^g 1 ^} = 1 — o(l). Then 

P {rfeoA} = P {A m | ns 1 A t ] p {ns'A} = p {A m nK 1 A t } (1 - 0 (i)). 

Thus it suffices to show that 


P {A c m | ns 1 A t } = p | n'”-, 1 A,} < f { E U nK‘ A t } = 0 (i). 

*e[ 4] 


The event n[T 0 l A t implies that 


m— 1 


m— 1 


771—1 771—1 


\M m -i\ < \y?\ + M i < kt + X^ logn )* = 0 ( logn )' 


t =1 


t =1 


t=l 


t=l 


For E m l to happen, the k m arcs that grow out of must end at different vertices in 
[n\ \ M m - 1 . Thus 


P{B m , 1 |n,”-„ I A}= I] 


0 <i<k m 


n - |Af m _i| - i 


n 


> 


O (log n) 


n 


= l-o(l). 


For E m 2 to happen, the vertices in y~_ 1 cannot grow in-arcs that have starting vertex 
in in M m -\ U y fl^g l A t implies that |3C-i! < (logn) m_1 . Since deterministically 
Tm = k m , \Mlm—i U = O (logn)” 1 . Thus the number of in-arcs that need to not 
grow at time m — 1/3 to make sure that E m>2 happens is at most 

k\y-_ l \\M m -My+\=0{\ogn? m . 

Since an in-arc does not grow with probability 1 — 1/n, 

/ | \ 0 (logn) 2m 

P {E m;2 1 n^o 1 At} > 1 1 - - j = 1 - o(l). 

Let X v be the number of in-arcs that grow from T m _i and that have starting vertex 

v. Conditioned on y~- 1 , X v = Bin(/c|3^_ 1 |, l/ n )- Since A t implies T r 7t-i < 
(logn) m_1 , 

P {X v < 1 I n^o 1 A] > p |Bin (^(lognp- 1 , ^ < l| 

/ i \ fc(logn ) m_1 -I / i \ fe(logn) m_1 -l 

= 1 -- + k{\ogn) m ~ l — (1 -] 

\ n J n \ n J 

= 1 _q( (l°g n ) 2(m ~ 1) \ 

V n2 / 
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Since for two different vertices u and v, X u and X v depend on disjoint set of arcs, 
(X M ) u6 [„] are mutually independent. Thus 


P {E m ^\ njTo 1 At} 


= p{n„ eW [x„ <i]| n ^ 1 A t ) 
> ^ Qogw) 2 (m - i) ^y' = 


i - o(i)- 


Since (|jV) |)t>i is a Galton-Watson process with a Bin (kn, 1 /n) offspring distribution, 
E|^“| = k m . Thus P{|^“| > (logn) m } = o(l). Therefore 


P {E c mA I n-o 1 At} ee P {|y-| > (logn) m j n-o 1 At] < 


> (log Tl) 171 } 


= 0 ( 1 ), 


P{nr=o%} 

where the last equality is clue to the induction assumption that P {fl]X 0 1 At} = 1 

o(l). 


□ 


6.2 Path counting 


For three disjoint sets of vertices A, B.C C [n]. let Ng denote the number of paths of 
length t that start from A and end at B , and that have all internal vertices in C. In the 
next subsection, we use the second moment method to lower bound P {Ng > 1}, which 
requires estimates of E [Ng] and Voir (Ng). The following lemma does so by using the 
path counting technique [40, chap. 3.5]. 


Proposition 2. Let u, t and Ad be three positive integers, possibly depending on n. Let 
A,B.C C [n] be disjoint sets of vertices with \ A\ — \B\ — M > 1 and \C\ >n — ui. There 
exist constants Cj and C 2 such that 


E Ng > 


k/ : M 2 


n 



(u + e)t \ 
n J ’ 


(5) 


and 


Va- W )<E N, + C fAA + cA‘ MHa 


( 6 ) 


rr rr 

Proof of (5). Note that if n < (u> + £)t, then (5) is trivially true. So we assume that 
n > (co + £)£. We simplify by contracting A and B into to two special vertices v a and Vb- 
The vertex v a has out-degree kM and the vertex ry has probability Mfn to be chosen 
as the endpoint of each arc. Consider an unlabeled path of length £ > 1 from v a to 
Vb- There are kAd ways to label the first arc. There are k^ 1 ways to label the other 
arcs. Recall that (x) y = (x — l)(x — 2) • • • (x — y + 1). There are (\C\)g-i ways to label 
the internal vertices of the path. The probability that a vertex-and-arc labeled path of 


length i from v a to ty exists is (1 fny 1 (M/n 


E N e = (kM)k 1 - 1 

kAd 2 


(\C\)t-i - 

n 


Thus 

Ad 


t -1 


n 


> 


n 


1 - 


w fl 


n 


> 


kAd 2 


n 


1 - 


(lo + £)£ 


n 


where the last step is because (1 — x) y > 1 — xy when x > 0, y > 1. 


□ 
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Proof of (6). Let C be the space of all possible arc-and-vertex labeled paths of length 
from v a to Vb through C. In other words, if a e C, then 


a = 


Vq = v a 


a 0 i v \ 


, V# — Vb) 1 


where a\2 ,..., a}ff\ x are arc labels and ..., v\ are different vertex labels in C. For 
a G £, let t a be the indicator that a appears. Given two paths a,B e £, call them 
arc-disjoint if there does not exist an i such that v\°^ = and = afK If two 
paths a and /3 are arc-disjoint, then t a and tp are independent, since they depend on 
the endpoints of two disjoint sets of arcs. Let a ~ f3 denote that a and /3 are not 
arc-disjoint and that a and (3 can both appear simultaneously. Then 

V®r (N e ) = ( E M/?] - E [1«] E [1/?]) 

a,/3e£ 

< J2 l[a~/3] [E [l a tp] - E [l a ] E [1^]] 

a,/3eC 

< EN e + l[a~/3]l[a^]E [lal/j] 

a,/3e£ 

= EA^ + 1. 


To bound /, we use a technique called path counting. Consider two paths a, (3 E C 
with a ~ (3 and a ^ (3. First colour all vertices and arcs in a and f3 white. Then colour 
all vertices and arcs shared by a and [3 black. After this, a and f3 both contain the same 
number, say m, of white paths separated by black paths (possibly a single black vertex). 
Since both a and f3 start and end with black paths, each of them contains m + 1 black 
paths. Define: 

1 . x m+ i = (xi,... ,x m+ i), where Xi > 0 denotes the length of the i-th black path in 

a. 

2 . s m = (si,..., s m ), where s* > 0 denotes the length of the i-th white path in a. 

3. t m = (ti ,..., t m ), where f * > 0 denotes the length of the i-th white path in j3. 

4. o m+ 1 = (o 1; ... ,o m+ 1 ) records the order in which black paths appear in f3. Note 
that 0 \ = 1 , o m+ 1 = m + 1 , and ( 02 ,..., o m ) is a permutation of { 2 ,..., m). 

Dehne the shape of a and f3 by Sh(o:, (3) = (x m +i, s m , t m , o m+ f). 

Let r be the number of arcs shared by a and [3, i.e., r = x i- Since a ~ (3 and 

a 7 ^ (3, 1 < r < t. Thus there are t — r white arcs in a. Since each white path contains 
at least one white arc, there are at most t — r white paths in a , i.e., m < £ — r. As a 
and f3 must differ by at least one arc, m > 1. Let S„ hr denote the set of shapes of two 
paths in C that share r arcs and each contains m white paths. Then / can be expressed 
as a sum over r, m and S„ hr by 

E= "22 l[Sh(a,/3)=(r]E [IqI/s] = ^2 

1 <r<t 1 <m<£—r cr£Sm,r &,/3 e£ 1 <m<£ 1 <r<£—m <J(zSm,r 
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Figure 5: A pair of paths and their shape. 


Now fix m, r and a shape a = (x m +i,s m , t m , o m+ \) e S m , r . Consider arcs in two paths 
a, (3 G C with S(a,j3 ) = a. Call those starting from v a a- arcs, those ending at Vb b- arcs, 
and other arcs middle-arcs. Let z a = l[ Xl =o] and Zb = lp m+ 1 =o]- In other words, z a is 
the indicator that a and f3 do not share an a-arc, and Zb is the indicator that they do 
not share a b- arc. Then a and (3 contain 1 + z a a-arcs and 1 + Z} } b- arcs. Since a and (3 
are both of length t and they share r arcs, they contain 2£ — r arcs in total. Thus they 
contain 2£ — r « (1 + z a ) — (1 + Zb) — 2 £ — r z a — Zb — 2 middle-arcs. 

Recall that black paths are shared by a and (3. Since the i-th black path is of length 
Xi, it contains Xi + 1 black vertices. So the number of vertices shared by the two paths 
is (x* + 1) = r + m + 1. Therefore in total there are 2(T + 1) — r — m — 1 vertices 

in the two paths, and among them 2 £ — r — m — 1 are internal vertices. 

The above argument shows that, given two unlabeled path of the shape cr, there are at 
most n 2{ - r - m - 1 ways to choose the internal vertices. There are at most {kM) l+Za ways 
to label a-arcs. There are j t ‘ 2 ^ r ~ z o.-z b -2 wa y S label middle-arcs. There are at most 
k 1+zb ways to label 6 -arcs. Thus 

|{( a,/3) e C x C : Sh(a, /3) = a}\ < n 2l - r - m -\kM) l+Za k 2e - r - Za ~ Zb - 2 k Zb+l 

= n 2l ~ r ~ m ~ 1 M 1+Za k 2l ~ r . 


And the probability that a pair of paths with shape a does appear is 

4 \ l+ z “ / l \ 2 ^-r-z a -z b -2 / 1+z b fyjl+z b 


n 


n 


n ) 


n 


2 e-r ■ 


Together, 


J m = Y, t[SHa^=a]^t a U<n 2e - r - m -Hd 1+ ^k 2£ - r 


M 1+Zb 


a,/3£C 

k 2t ~ r M 2 + z a+z b 


n 


2 e-r 


n 


m +1 


= K 


m,r,z a ,z b • 


(7) 


Let S m r Za Zb be the set of shapes with parameters m, r,z a ,Zb. Then we have S. m , r = 
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Uz a ,z b e{o,i}Sm,r,z a ,z b i where the sets in the union are disjoint. Thus 


',r,Za,Z b I K m ,r,Z a ,Zb 


!= E E E E 

1 <m<l z a ,z b £{0,l} 1 <r<£—m o£S m ,r,z a ,z b 

sEE E I s ™, 

1 <m<Z z a ,z b £{ 0 , 1 } 1 <r<l—m 

= ^ ^ ^ \Sl.r,z a .z b \K\,r,z a) z b + ^ ^ ^ ^ ^ ^ \Km, 

Za,Zi)G{0,l} 1 <r<£—m 2 <m<l z a ,z b £{ 0,1} 1 <r<£—m 

= /[l] _|_ /[>2]_ 


r,z a ,z b 


By counting the choices of x m +i, s m , t m , o m +i, we can upper bound |<S TO)rvZai2 J: 

Lemma 17. If m > z a + z b , then 

\Sm,r,z a ,z b \ = (r + l)”—-' f - r - A ft - r - b (m _ !)!. (8) 

\ m — 1 J \ m — 1 / 

If m < z a + z b , then \S m ^ ZaiZb \ = 0. 

Proof of Lemma 17. First consider m > 2, which implies that m > z a + When 
z a = 1, xi — 0. When z b = 1, x m+ \ = 0. Thus the number of ways to choose x m+ i 
equals the number of ways to choose m + 1 — z a — z b > 1 ordered non-negative integers 
such that they sum to r, which is well known to be (r + l) m ~ Za -^ 6 , which explains the 
first factor in (8). Similarly the second term and the third term are the numbers of ways 
to choose s m and t m respectively. The last term is the number of ways to choose o m+ 1 
since 02 ,..., o m is a permutation of {2,, m}. 

Now assume m = 1. If z a -\-z b < m — 1, the above argument still works. If z a + Zb > 1, 
then z a — z b — 1. In other words, the two paths do not share arcs at the beginning 
and at the end, and they must meet at least one internal vertex. So in this shape, there 
must be at least two white sub-paths in each of the two paths, i.e., m > 2, which is a 
contradiction. Therefore, = 0. □ 

Lemma 18. I’M < 6 k' u M 3 /n 2 . 

Proof of Lemma 18. By (7) and the above lemma, 
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where the last step is because < r ' r /2 r — fi° x /2 x dx < 2. Similarly, 



X 7l l ri0l l — 5l,r,l,0 

x A' l r l) o 

l<r<^—1 

l<r<i-l 



= ^ (r + 1)° 

l-1 

1 

O 

1 


l<r<i-l 


< k 2l M 3 y, 1 

— n 2 for 

1 <r 



& 2<? M 3 ^ 1 
- n 2 2 r ~ 

k n M 3 
n 2 


Also by Lemma 17, = 0- Thus 


/W = J 2 \ S ^r,z a ,z b \ X Kl, r , Za , z 

Za,z b &{0,l} l<r<Z-l 

t k n M 2 k n M 3 k 2t M 3 

< 4-^-h 2-r-h 0 < 6-„—. 

n n n 


Lemma 19. A- 2 ! = 4 £ 4 k 2e M 4 /n 3 . 

Proof of Lemma 19. By Lemma 17, for r € [1, £), 


Thus 


^ 1 \<Sm,r,z a ,Zb I X Km,r,z a ,z 
Z a ,Zb&{ 0 ,l} 


£ (r + l) m 

^a , ^e{o,i} 

f2(m— 1) l,2(—r 

fh___ 

(m — 1)! n m+1 


m—z a —z b 


l — V — 1 \1 2 , k 2 i-r M 2 +z a +z b 

, (m — 1)!-—- 

m — 1 / n m+1 


£ M 

-2a,^6 G {0,1} 


'2+2: a +2:5 


p3m—2 \~2Z—r 

< — ---4M 4 . 

““ (m - l)!n m+1 


g3m—2 k 2£—r 

£ £ |‘5ro,r,z a ,zJ X K m , r ,z a ,Zb — £ 

(m - 1)!//"' • : L1/ 


l<r<^—m z a ,z 6 e{0,l} 


l<r<£—r 


P™~2 k 2l 1 

< -4M 4 > — 

— (m — l)!n m+1 ^ fc r 

v 2 l<r 

n3m-2l.2l 

< — ---4M 4 . 

- (m - 1 )\n m+1 
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Therefore, 


/M = V V V |5 m , w Jx/i,„, 

2 <m<£ 1 <r<l—m z a ,Z{,e(0,l} 


r,Z a ,Zb 


< 


£ 


< 


2<m 

£k n AM 4 


^3m—2 

(m — l)!n m+1 


AM 4 


rr 


£ 

2 <m 


£3(m-l) 


n m_ 1 (m — 1)! 


£lP e AM 4 f \ £ 4 k 2l M 4 

< -j- I exp < f — 1 ) < 4 -§—• 

rr \ f n J ) rv 

By Lemma 18 and Lemma 19, 

r rm r >2l k 2l M 3 J 4 k 2l M 4 
n z n 6 

Thus Vcr (N e ) < E [N e \ + I = E [N e ] + 6k 2£ M 3 /n 2 + U 4 k 2l M 4 jn 3 . 


□ 


□ 


6.3 Finishing the proof of Theorem 8 

Proof of the upper bound of the typical distance. We can assume £ < 1/2. Recall that 
= [(1 + e) log fe nj and that B n = [if n < H n < oo]. As argued at the beginning of this 
section, to finish the proof of Theorem 8, it suffices to show that P {B n } = o(l). 

Let u> n = ifn. Let M, m be two positive integers which are picked later. Recall 
that Sf~ (■ v) and S~ (v) are the sets of vertices at distance exactly i from or to vertex v 
respectively, and that S^v) and Sf t (v) are the sets of vertices at distance at most i 
from or to v respectively. The following argument shows that by properly choosing M 
and m, the probability that there exists a path of length exactly — 2 m from 
to Sf m (v 2) is at least 1 — 6 for n large enough, where <5 > 0 is arbitrary and fixed. 



Let the event A n (M,m ) be defined as in Corollary 4, i.e., 


A n (M, m) = [M < |5+(ui)|] n [M < |5 m (u 2 )|] n [|5< m (u 2 )| < • 
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Since each vertex has out-degree exactly k > 2 , deterministically, 

|5J m _ 1 ('t-i)l <! + * + ••• + < V, |5+(«,)| < V>. 

Since xp n > 2m for n large enough, D n implies S< m (v i) and Sp m (v 2 ) are disjoint. Thus 
the event A n (M, m)(lB n implies that (<S< m _i(vi), 5 +(di), S~(v 2 ), <S< m _ 1 (n 2 )) G A, where 
A is a set of quadruples of disjoint sets of vertices defined by 


A = {(«Si, S 2 , S 3 , S 4 ) : V\ G Si; v 2 G S 4 ; 

|Sx| < k m - M < |S 2 | < k m ; M < |S 3 |; |S 3 U S 4 | < u n }. 


For S = (Si, S 2 , S 3 , S 4 ) G A, dehne the event 


^n(S) = [s< m _ 1 (t» 1 ) — Si] n [s+(vi) — s 2 ] n [s m (u 2 ) — s 3 ] n [s< m _ 1 (u 2 ) — s 4 _. 


Thus [B n D Am(M,m )] C U § & j\B n D A' n (S)] and the events in the union are disjoint. 

Now fix a S G A. Let Ag and Bg be arbitrary subsets of S 2 and S 3 respectively with 
\Ag\ = M and \Bg\ = M. Let Ng be the number of paths of length ip n — 2m that start 
from Ag and end at Bg, and that contain internal vertices only in Cg = [n] \ U ie [ 4 ]Sj. 
Thus there are \Cg\ — n — | U ie [ 4 ] S,| > n — ( uj n + 2 k m ) vertices that can be internal 
vertices of these paths. By (5) of Proposition 2, 


ENg> 


k 4> n -2m M 2 / + 2k m + ip n - 2m) (ip n - 2m) 


n 


1 - 


n 


> 


k( 1+e ) lo § k(A-l-2m 


n 




n 


n £ M 2 1 

Jg2m+1 2 ’ 


for n large enough. By ( 6 ) of Proposition 2, 


Vor (Ng) < ENg + Ci 


< ENg + C 4 


k 2Wn-2m) M 3 k 2(f n -2m) M 4^ n _ 2m y 


n~ 


+ c 


n° 


n 2(l+e) M 3 n 2(l+ £ ) M 4^4 


ri 2 k 4r 


C 


n 3 k Ar 


< EAC+Ci 


n 2e M 3 M 4 (logn ) 4 


k 4r 


+ C; 


k 4m n l ~ 2e ’ 


where C 3 is a constant that does not depend on M or m. Thus 


r , Vor (Ng) 2 k 2m+1 Cgn 2e M 3 k~ 4m C 3 M 4 (log n) 4 n 2£ ~ 4 k~ 4m 

15 J “ (ENg) 2 ~ n £ M 2 (n £ M 2 2~ 1 k~ 2m ~ 1 ) 2 (n £ M^k- 2 ™- 1 ) 2 

^ 2k 2m+1 Ak 2 Ci 4/c 2 C 3 (logn ) 4 

- n £ M 2 + M + n ' 

Later m is chosen solely depending on M. Thus we can pick M large enough such that 
for n large enough, P {Ng = 0} < 5/2 for all S G A. 
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If H n > -0 n , then there cannot exist paths of length t/> n — 2 m from <S+(vi) to S m (v 2 ). 
Thus B n D A' n (S) implies that [Ng — 0] fl A' n (S). A crucial observation is that 

p{]V s - = o|X(S)} <P{iV s -=0}. 

This is because A' n (S) implies that arcs starting from vertices in Cg cannot choose vertices 
in iS< m _ 1 (u 2 ) = 5 4 as their endpoints. Whereas when we compute P {Ng = 0} without 
any condition, arcs starting from vertices in Cg arc allowed to choose all vertices as their 
endpoints. Thus some of these arcs are possibly “wasted” by choosing their endpoints 
in S 4 . This increases the probability that Ng = 0. Thus 

p {b„ n4,(5)} < p{[JVj = 0] nA,(S)} = p [n s - = o |<(5)} p{ a,(S)} 

< P {Ng = 0} P {<(5)} < 6 -P («S)} . 

Therefore 

P{B„nA.(M,m)} < ^p{B„n«5)} < ^p{<(5)} 

SeA 

< £ A} < ^. 

By Corollary 4, we can choose m depending on M such that for n large enough, 
P {B n fl A c n (M , m)} < 5/2. Thus 

lim sup P {B n } = lim sup (P {B n fl A n (M, m )} + P [B n fl A^(M, m)}) <5. □ 

n—>00 n—>• 00 

7 Extensions 

Addario-Berry et al. [1] also proved that the diameter of the giant component divided by 
log n converges in probability to l/log(/c) + l/log(l/Afc). Recall that the longest path 
outside the giant divided by log n converges in probability to l/log(l/Afc). This seems 
to be a strong indication that it might be possible to derive a new proof for the diameter 
of the giant. 

Recall that £>* k is a simple h-out digraph with n vertices chosen uniformly at random 
from all such digraphs. Section 5 proved that if whp T> n k has property P, then whp 
T>* n k has property P. But results like Theorem 1, the central limit law of the one-in-core, 
cannot be transferred to T>* n k automatically. We believe that it might be possible to 
achieve get the same result for P* k following the line of Janson and Luczak’s treatment 
of the configuration model [25]. 

A natural generalization of T> n k is to have a deterministic out-degree sequence, as in 
the directed configuration model, instead of requiring each vertex to have out-degree 
exactly k. With some constraints on the out-degree sequence, most of our results should 
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hold for this generalized model. Furthermore, we could let each vertex choose its out- 
degree independently at random from an out-degree distribution. Again by adding some 
restrictions on the out-degree distribution, most of our results should still hold. 

The problem of generating a uniform random surjective function with fixed domain 
size is an open problem. Theorem 1 implies a simple algorithm for choosing a [km] —> [m] 
surjective function uniformly at random. Let n = Then we generate a D n ,k- If 

\O n \ = m, i.e., if the one-in-core in D n ,k contains m vertices, then T> n ^[O n ] is equivalent 
to a uniform random sample of a [km] —» [m] surjective function. Otherwise we try again 
until \O n \ = m. Theorem 1 shows that P {\O n \ = m } = 0(1 / \[m). Thus the expected 
number of V n k needed to be generated is Q(\/rn). Since generating a T> n k takes ©(m) 
time, the expected running time of the whole algorithm is 0(m 3 / 2 ). But we believe that 
O(m) should be achievable. 
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Appendix 

1. Inequalities for constants 

Lemma Al. Assume that k >2. 

(a) There exists exactly one r k > such that 1 — r k /k — e~ Tk = 0; 

(b) 0 < k - r k < 1 / 2 ; 

(c) 1/2 < 1 — A < Vk = Tk /k < i ; 

(d) \ k = (k- r k ) (j^) k 1 < \' k = {k- r fc )e 1_fc+Tfe < 1 ; 

(e) It = (j|) (e T * - 1 ) < 1 ; 

(f) pt = ke 1 -™ < 1 - 

(d) Afc = Q{ke ~ k ) as k —>• 00 . 


Proof. Let r](x) = 1 — x/k — e~ x . Since r]"(x) = — e~ x < 0, r](x) is strictly concave. 
Since rj(k — 1 / 2 ) > 0, and r]{k) < 0, r](x) = 0 must have exactly one positive solution 
and this solution must be in (k — 1/2, k). Thus (a) and (b) are proved, (c) follows since 
r k /k > 1 — 1/k > 1/2. For (d) note that \ k < \ k as 1 — x < e~ x for all 1 / 0. For 
X' k < 1 note that 

log \ k = log (k - Tfc) + 1 - (k - Tfc) = log [1 - (1 - (k - Tfc))] + 1 - {k - Tfc) < 0, 


since log(l — x) < —x for all x G ( 0 , 1 ). 
For (e), hrst use r k /k = 1 — e -Tfc to get 


Then use ke Tk = k — to get 


log 7 fc = T k -k + (1- k) log(l - e~ Tk ) 

= (■ r k - k) + log(l - e _Tfc ) - k log(l - e' Tfc ) 

< (rfc — k) — e~ Tk + k(e~ Tk + e _2rfe ) 

= -k) + {k- Tfc) + e~ Tk {k - Tfc - 1 ) < 0 , 
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since —x > log(l — x) > — x — x 2 for all x e (0,1/2) and e Tk = 1 — v k e (0,1/2). 

For (f), use Tk < k from (a) to get 

Tk = k( 1 — e~ Tk ) < k( 1 — e~ k ). (9) 

Therefore, 

j = 1 - e~ Tk < 1 - exp {-k (l - e ~ k )} . 

Again by (a), t*, > k — 1/2. Thus 

Tfc = /c(l — e _Tfc ) > k( 1 — e _fe+ 2 ). (10) 

Therefore, 

A;e _Tfc < A; exp j —k ^1 — e _fc+ 5j j . 

The above bounds imply that 

p k = ke x - Tk < A:exp 11 - k (l -(l - exp {-k (l - e - *)})* -1 . 

Using this bound, numeric computations show that p -2 < 0.945651. When k > 3, the 
above upper bound is less than 

A;exp jl — A;^l — j , 

which takes its maximal value at k = 3 for k G [3, oo). This maximal value is about 
0.52. Thus Pk < 1 for all k > 2. 

By (9) and (10), k — T k = A;e -fc+ ° (1) and T k /k = 1 — e ~ k+ °^ as k —» oo. Therefore 

A ‘ = (k ~ Tk) (rvr) 

= ( fc_T ‘) (v) 

= A;e- fc+ ° (1) (1 - e - fc +°( 1 )) fc_1 e (l + 0 (1)) = A:e- fc+ ° (1) . 

Thus (g) is proved. □ 



2. The sizes of A>surjections 

In this section we prove Lemma 1. Recall that K s is the number of A;-surjections of size 
s in D n k- We first deal the case that s is small: 

Lemma A2. P {K\ > 1} < l/n fc_1 < 1 jn. 

Proof. A single vertex is a A;-surjection if and only if all its k arcs are self-loops. Thus 
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Lemma A3. P {^ 2 <s<an^s > l} = o(l/n), for all fixed a G (0,e '/( fc d). 

Proof. We can choose e G (0,1) such that 2(7 — 1)(1 — e) > 1 since k > 2. Let 
J = {2,..., [an\ }. Then 


p {E*.*4s£ £ p ( 5 

V sGJ ) s£j <SCfnl:|<S|=s 


is closed} 


£ 

S^zJ 


n 


s\ 


ks 


sj \n) 


<> 7™Vi- 


£ 

sG«/ 


vs/ \n 


ks 


£ 


2 <s<n £ L 


< 


n 

e I — 
n 


e I - 
n 

k -1 


k-l 


(Stirling’s approximation) 

£ 


n £ <s<an L 


e I - 
n 


k-l 


n 

e I — 
n 


k-l 


£ 

J 2<s+2 L 
= O (n-^-Od-d) + o ((ea fc_1 ) nE ), 

where both terms are o(l/n) due to our choice of e and a. 


j2( exak 1 y 


n £ <s 


□ 


When s is large, we need to take into account the probability that S is surjective. Let 
{*} denote Stirling’s number of the second kind, i.e., the number of ways to put x balls 
into y unordered bins such that there are no empty bins [17, pp. 64], Then 


P (5 is surjective | S is closed} = 


~>ks 


where the numerator is the number of ways to choose endpoints for the ks arcs in S 
so that minimum in-degree is one, and the denominator is the total number of ways to 
choose endpoints for ks arcs in S. Thus 

P (5 is a 7-surjection} = P (5 is surjective | S is closed} P (5 is closed} 




?ks 


S\ ks 

nJ 


(Tb 


n 


ks 


Good [19] established an asymptotic estimation of Stirling’s numbers of the second kind 


ks 

s 


(7s)! 


o'l'k 


1 )‘ 


S ! Tk ks ^/27t7s(1 — ke~ k ) 

Applying this and Stirling’s approximation for factorials, we have 

(7s)! ( e Tk — l) s s! 


P (5 is a 7-surjection} 


s ! Tk ks ^/27t7s(1 — ke k ) n 


ks 


y/1 — ke Tk 


S\> 


- 7 k 


( 11 ) 


where 7 *, = ( k/erfi) 1 ( e Tk — 1) < 1 (see Lemma Al). 
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Lemma A4. There exists a constant b G {u k , 1) such that P { Ylbn<s<n > l} = 
o (1 /n). 

Proof. Let b > v k be a constant decided later. If |«5| = s G [bn,n], then by (11) 


P {5 is a /e-surjection} = O 
Since b > v k > 1/2 (Lemma Al), 


n 


7 fc 


^ D (r\/ S \ D (rs/b n \ 

— W \ Yk) — u \lk )• 


< 


n 

bn 


= 0 ^ 


b b ( 1 - b) l ~ b 


Therefore 


Tl 

P {K s > 1 } < ( ) P (5 is a /e-surjection} < O 


7 1 


|_ 6 b (l - fo) 1 " 6 ] 


Since the quantity in the square brackets goes to 7 ^. < 1 as b —* 1 , we can pick a b close 
enough to one such that P { Yhbn<s<n > l} = o(l/n). □ 

Let a G (0, u k ) and b G (u k , 1) be two constants such that the upper bounds in 
Lemma A3 and A4 hold. If |<5| — xn with x G (a, b) and xn integer-valued, then by (11) 
and Stirling’s approximation 

n 


ti-Kxn = 


xn 


P {5 is a h-surjection} 


y / 27Tx(l — x)n L( x ) r (1 — a: ) 1 " 

g (x) [f ( x)] n 


a/1 — ke Tk 


= i x S k)' 


where 


g(x) = 


a/27t(1 — he Tk ) 
1 


( 12 ) 


n 


\f x( 1 — x) ’ 


f(x) = 


L_ | “| X 

x k 


[(1 - .r)('- x )/ J 


Lemma A5. For all fixed a G (0, v k ), b G ( u k , 1) and 6 G (0,1/2), P {)C. 5 eJ K s > l} = 
o(l/n), where J = [an, u k n — n^ +5 } U [u k n + n^ +5 , bn}. 

Proof. Let h(x) = log f(x). Lemma A 6 shows that as x — » u k , 

(x - n k ) 2 


h(x) = 


2 al 


+ 0(\x- u k \ 3 ), 


and that h(x) is strictly increasing on (a, u k ) and strictly decreasing on ( u k , b). It follows 
from | s/n — u k \ > n _1//2+<5 that h(s/n ) < —n 2l5_1 /2a k + O (n 3<5-3//2 ). As for g{x), it is 
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bounded on (a, b). Thus by (12) and Markov’s inequality 


log(n 2 P {K s > 1}) < log(n 2 E/l s ) 

= l°g (n 2 0(n" 1/2 )/(^) ) 

= O (log n) + nh j 

< O (log n) - 7 ^ + 0 (n 35 ~ 1/2 ) , 

Za k 

which goes to — 00 . In other words, P {K s > 1} = o(l/n 2 ). So P{^ seJ il' s > l} = 
o(l/n). □ 

Lemma 1 follows immediately from Lemma A2, A3, A4, and A5. 


3. Special functions 

Lemma A 6 . Let f(x), g(x ) and h(x) be defined as in the previous subsection. Let u k , 
r k and cr k be as in Lemma Al. Then 

(a) As x ^ u k , g (x) = g(v k ) + 0{\x - v k \) = (1 + O flic - v k \)) /(a k \Jl - ke~ Tk ). 

(b) h(x) and f(x) are strictly increasing on (1 — |, v k ) and strictly decreasing on (. v k , 1 ). 

(c) As x —* u k , 

h(x) = h(v k ) + 0 (\x - * 4 -| 3 ) = ~ — 0 + 0 (\x - u k | 3 ), 

2(J k 

which implies that 

fix) = e hM = exp|- (l '~^ t>2 | + 0 (\x - ! 4 | 3 ). 


Proof. For (a), recall that a 2 = r k /(ke Tk ( 1 — ke~ Tk )). Thus cr|(l — ke ~ Tk ) = v k (l — v k ). 
Then g{y k ) = 1 f \Jv k {l — v k ) = \ ja k y/\ — ke~ Tk . Since g'(x) is bounded around v k , by 
Taylor’s theorem, 

g(x) = g(v k ) + 0 (\x - u k \) = (1 + O (|x - u k \)) - 1 as x v k . 

a k \J\ - ke~ Tk 

Let r(x) = log (/(a;) 1 / 3 ’) = h(x)/x. Using r k /k = 1 — e~ Tk = u k shows that 
1 \ 

— e Tk u k = u k ~ k+ 1 e~ k+Tk = u k - k+ 1 (e~ Tk Y k - Tk)/Tk = v k ~ k+1 (l - v k )^ Vk)/vk . 
ev k J 



47 



1 


Then r(u k ) = log (u k k X (1 - u k ) (Uk 1),Vk ^ = log(l) = 0, 


r'(x) = - + — log(l - x ), 
x x z 

Therefore r'(u k ) = 0 and r"{y k ) 
Since h(x) = xr(x), 

h'(x) = r(x) + xr'(x), 


and r"(x) 
-l/(u k a 2 k ). 


k 21og(l — x) 

x 2 x 3 x 2 (l — x)' 


h"(x ) = 2 r'{x) + xr"(x) 


k _ 1 

x x(l — x) 


Thus h{y k ) = 0, h!{v k ) = 0 and h"(u k ) = — 1/af. Also recalling that 1 — | < 1 — ^ < 
u k < 1 (Lemma Al), h{x) is strictly concave on (1—1,1), reaching maximum at v k . Thus 
(b) is proved. The two asymptotic equations in (c) follow from Taylor’s theorem. □ 


4. Probability generating functions of Galton-Watson processes 


Lemma A7. Let p e (0, Ad be a constant where k > 2. Let ( Z m ) m > 0 be a Galton- 
Watson process with Z 0 = 1 and offspring distribution Bin (k,p). Let <p m (y) = §~y Zm ■ 
Then 

Tm( 0) < 1 — ( kp) m + 


2m ) M 


m+1 


Proof. We use induction. Let c m = 1 — l/2 m . For rri — 1, 

<pi(y) = E: y Zl = (i - M 1 - y)) k - 

Since p > 0 and k > 2, by Taylor’s theorem, 

V?i(0) = (1 - p) k < 1 - kp + — 1 — kp + ci(kp) 2 . 

It is well known that for m > 1, < p m (y ) = TiiPm-iiy)) (see [13]). Assuming the lemma 
holds for m, then 

Tm+\ (0) = 0)) = (1 - p (1 — ^m(0))) fc 

< (1 -p((kp) m -c m (kp) m+1 )) k 

< 1 -kp (( kp) m - c m {kp) m+1 ) + y/i 2 {{kp) m - c m (kp) m+1 ) 2 

= 1 - (kp) m+1 + c m {kp) m+2 + M^(l - c m kp) 2 (kp) m+2 

< 1 - (kp) m+1 + c m+1 (kp ) m+2 , 

since kp < 1/2 and c m+ i = c m + l/2 m+1 . □ 


48 



